2018-08-10 19:12:11

by Jeffrey Lien

[permalink] [raw]
Subject: [PATCH] Performance Improvement in CRC16 Calculations.

This patch provides a performance improvement for the CRC16 calculations done in read/write
workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
folks from utilizing the throughput of such devices. To speed up this calculation and expose
the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
with a larger CRC table to match. The result has shown 5x performance improvements on various
big endian and little endian systems running the 4.18.0 kernel version.

FIO Sequential Write, 64K Block Size, Queue Depth 64
BE Base Kernel: bw=201.5 MiB/s
BE Modified CRC Calc: bw=968.1 MiB/s
4.80x performance improvement

LE Base Kernel: bw=357 MiB/s
LE Modified CRC Calc: bw=1964 MiB/s
5.51x performance improvement

FIO Sequential Read, 64K Block Size, Queue Depth 64
BE Base Kernel: bw=611.2 MiB/s
BE Modified CRC calc: bw=684.9 MiB/s
1.12x performance improvement

LE Base Kernel: bw=797 MiB/s
LE Modified CRC Calc: bw=2730 MiB/s
3.42x performance improvement

Reviewed-by: Dave Darrington <[email protected]>
Reviewed-by: Jeff Furlong <[email protected]>
Signed-off-by: Jeff Lien <[email protected]>
---
crypto/crct10dif_common.c | 605 +++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 569 insertions(+), 36 deletions(-)

diff --git a/crypto/crct10dif_common.c b/crypto/crct10dif_common.c
index b2fab36..40e1d6c 100644
--- a/crypto/crct10dif_common.c
+++ b/crypto/crct10dif_common.c
@@ -32,47 +32,580 @@
* x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
* gt: 0x8bb7
*/
-static const __u16 t10_dif_crc_table[256] = {
- 0x0000, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
- 0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
- 0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
- 0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
- 0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
- 0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
- 0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
- 0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
- 0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
- 0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
- 0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
- 0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
- 0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
- 0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
- 0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
- 0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
- 0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
- 0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
- 0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
- 0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
- 0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
- 0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
- 0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
- 0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
- 0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
- 0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
- 0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
- 0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
- 0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
- 0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
- 0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
- 0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
+static const __u16 t10_dif_crc_table[16][256] = {
+ {
+ 0x0000u, 0x8BB7u, 0x9CD9u, 0x176Eu, 0xB205u, 0x39B2u, 0x2EDCu, 0xA56Bu,
+ 0xEFBDu, 0x640Au, 0x7364u, 0xF8D3u, 0x5DB8u, 0xD60Fu, 0xC161u, 0x4AD6u,
+ 0x54CDu, 0xDF7Au, 0xC814u, 0x43A3u, 0xE6C8u, 0x6D7Fu, 0x7A11u, 0xF1A6u,
+ 0xBB70u, 0x30C7u, 0x27A9u, 0xAC1Eu, 0x0975u, 0x82C2u, 0x95ACu, 0x1E1Bu,
+ 0xA99Au, 0x222Du, 0x3543u, 0xBEF4u, 0x1B9Fu, 0x9028u, 0x8746u, 0x0CF1u,
+ 0x4627u, 0xCD90u, 0xDAFEu, 0x5149u, 0xF422u, 0x7F95u, 0x68FBu, 0xE34Cu,
+ 0xFD57u, 0x76E0u, 0x618Eu, 0xEA39u, 0x4F52u, 0xC4E5u, 0xD38Bu, 0x583Cu,
+ 0x12EAu, 0x995Du, 0x8E33u, 0x0584u, 0xA0EFu, 0x2B58u, 0x3C36u, 0xB781u,
+ 0xD883u, 0x5334u, 0x445Au, 0xCFEDu, 0x6A86u, 0xE131u, 0xF65Fu, 0x7DE8u,
+ 0x373Eu, 0xBC89u, 0xABE7u, 0x2050u, 0x853Bu, 0x0E8Cu, 0x19E2u, 0x9255u,
+ 0x8C4Eu, 0x07F9u, 0x1097u, 0x9B20u, 0x3E4Bu, 0xB5FCu, 0xA292u, 0x2925u,
+ 0x63F3u, 0xE844u, 0xFF2Au, 0x749Du, 0xD1F6u, 0x5A41u, 0x4D2Fu, 0xC698u,
+ 0x7119u, 0xFAAEu, 0xEDC0u, 0x6677u, 0xC31Cu, 0x48ABu, 0x5FC5u, 0xD472u,
+ 0x9EA4u, 0x1513u, 0x027Du, 0x89CAu, 0x2CA1u, 0xA716u, 0xB078u, 0x3BCFu,
+ 0x25D4u, 0xAE63u, 0xB90Du, 0x32BAu, 0x97D1u, 0x1C66u, 0x0B08u, 0x80BFu,
+ 0xCA69u, 0x41DEu, 0x56B0u, 0xDD07u, 0x786Cu, 0xF3DBu, 0xE4B5u, 0x6F02u,
+ 0x3AB1u, 0xB106u, 0xA668u, 0x2DDFu, 0x88B4u, 0x0303u, 0x146Du, 0x9FDAu,
+ 0xD50Cu, 0x5EBBu, 0x49D5u, 0xC262u, 0x6709u, 0xECBEu, 0xFBD0u, 0x7067u,
+ 0x6E7Cu, 0xE5CBu, 0xF2A5u, 0x7912u, 0xDC79u, 0x57CEu, 0x40A0u, 0xCB17u,
+ 0x81C1u, 0x0A76u, 0x1D18u, 0x96AFu, 0x33C4u, 0xB873u, 0xAF1Du, 0x24AAu,
+ 0x932Bu, 0x189Cu, 0x0FF2u, 0x8445u, 0x212Eu, 0xAA99u, 0xBDF7u, 0x3640u,
+ 0x7C96u, 0xF721u, 0xE04Fu, 0x6BF8u, 0xCE93u, 0x4524u, 0x524Au, 0xD9FDu,
+ 0xC7E6u, 0x4C51u, 0x5B3Fu, 0xD088u, 0x75E3u, 0xFE54u, 0xE93Au, 0x628Du,
+ 0x285Bu, 0xA3ECu, 0xB482u, 0x3F35u, 0x9A5Eu, 0x11E9u, 0x0687u, 0x8D30u,
+ 0xE232u, 0x6985u, 0x7EEBu, 0xF55Cu, 0x5037u, 0xDB80u, 0xCCEEu, 0x4759u,
+ 0x0D8Fu, 0x8638u, 0x9156u, 0x1AE1u, 0xBF8Au, 0x343Du, 0x2353u, 0xA8E4u,
+ 0xB6FFu, 0x3D48u, 0x2A26u, 0xA191u, 0x04FAu, 0x8F4Du, 0x9823u, 0x1394u,
+ 0x5942u, 0xD2F5u, 0xC59Bu, 0x4E2Cu, 0xEB47u, 0x60F0u, 0x779Eu, 0xFC29u,
+ 0x4BA8u, 0xC01Fu, 0xD771u, 0x5CC6u, 0xF9ADu, 0x721Au, 0x6574u, 0xEEC3u,
+ 0xA415u, 0x2FA2u, 0x38CCu, 0xB37Bu, 0x1610u, 0x9DA7u, 0x8AC9u, 0x017Eu,
+ 0x1F65u, 0x94D2u, 0x83BCu, 0x080Bu, 0xAD60u, 0x26D7u, 0x31B9u, 0xBA0Eu,
+ 0xF0D8u, 0x7B6Fu, 0x6C01u, 0xE7B6u, 0x42DDu, 0xC96Au, 0xDE04u, 0x55B3u
+ },
+ {
+ 0x0000u, 0x7562u, 0xEAC4u, 0x9FA6u, 0x5E3Fu, 0x2B5Du, 0xB4FBu, 0xC199u,
+ 0xBC7Eu, 0xC91Cu, 0x56BAu, 0x23D8u, 0xE241u, 0x9723u, 0x0885u, 0x7DE7u,
+ 0xF34Bu, 0x8629u, 0x198Fu, 0x6CEDu, 0xAD74u, 0xD816u, 0x47B0u, 0x32D2u,
+ 0x4F35u, 0x3A57u, 0xA5F1u, 0xD093u, 0x110Au, 0x6468u, 0xFBCEu, 0x8EACu,
+ 0x6D21u, 0x1843u, 0x87E5u, 0xF287u, 0x331Eu, 0x467Cu, 0xD9DAu, 0xACB8u,
+ 0xD15Fu, 0xA43Du, 0x3B9Bu, 0x4EF9u, 0x8F60u, 0xFA02u, 0x65A4u, 0x10C6u,
+ 0x9E6Au, 0xEB08u, 0x74AEu, 0x01CCu, 0xC055u, 0xB537u, 0x2A91u, 0x5FF3u,
+ 0x2214u, 0x5776u, 0xC8D0u, 0xBDB2u, 0x7C2Bu, 0x0949u, 0x96EFu, 0xE38Du,
+ 0xDA42u, 0xAF20u, 0x3086u, 0x45E4u, 0x847Du, 0xF11Fu, 0x6EB9u, 0x1BDBu,
+ 0x663Cu, 0x135Eu, 0x8CF8u, 0xF99Au, 0x3803u, 0x4D61u, 0xD2C7u, 0xA7A5u,
+ 0x2909u, 0x5C6Bu, 0xC3CDu, 0xB6AFu, 0x7736u, 0x0254u, 0x9DF2u, 0xE890u,
+ 0x9577u, 0xE015u, 0x7FB3u, 0x0AD1u, 0xCB48u, 0xBE2Au, 0x218Cu, 0x54EEu,
+ 0xB763u, 0xC201u, 0x5DA7u, 0x28C5u, 0xE95Cu, 0x9C3Eu, 0x0398u, 0x76FAu,
+ 0x0B1Du, 0x7E7Fu, 0xE1D9u, 0x94BBu, 0x5522u, 0x2040u, 0xBFE6u, 0xCA84u,
+ 0x4428u, 0x314Au, 0xAEECu, 0xDB8Eu, 0x1A17u, 0x6F75u, 0xF0D3u, 0x85B1u,
+ 0xF856u, 0x8D34u, 0x1292u, 0x67F0u, 0xA669u, 0xD30Bu, 0x4CADu, 0x39CFu,
+ 0x3F33u, 0x4A51u, 0xD5F7u, 0xA095u, 0x610Cu, 0x146Eu, 0x8BC8u, 0xFEAAu,
+ 0x834Du, 0xF62Fu, 0x6989u, 0x1CEBu, 0xDD72u, 0xA810u, 0x37B6u, 0x42D4u,
+ 0xCC78u, 0xB91Au, 0x26BCu, 0x53DEu, 0x9247u, 0xE725u, 0x7883u, 0x0DE1u,
+ 0x7006u, 0x0564u, 0x9AC2u, 0xEFA0u, 0x2E39u, 0x5B5Bu, 0xC4FDu, 0xB19Fu,
+ 0x5212u, 0x2770u, 0xB8D6u, 0xCDB4u, 0x0C2Du, 0x794Fu, 0xE6E9u, 0x938Bu,
+ 0xEE6Cu, 0x9B0Eu, 0x04A8u, 0x71CAu, 0xB053u, 0xC531u, 0x5A97u, 0x2FF5u,
+ 0xA159u, 0xD43Bu, 0x4B9Du, 0x3EFFu, 0xFF66u, 0x8A04u, 0x15A2u, 0x60C0u,
+ 0x1D27u, 0x6845u, 0xF7E3u, 0x8281u, 0x4318u, 0x367Au, 0xA9DCu, 0xDCBEu,
+ 0xE571u, 0x9013u, 0x0FB5u, 0x7AD7u, 0xBB4Eu, 0xCE2Cu, 0x518Au, 0x24E8u,
+ 0x590Fu, 0x2C6Du, 0xB3CBu, 0xC6A9u, 0x0730u, 0x7252u, 0xEDF4u, 0x9896u,
+ 0x163Au, 0x6358u, 0xFCFEu, 0x899Cu, 0x4805u, 0x3D67u, 0xA2C1u, 0xD7A3u,
+ 0xAA44u, 0xDF26u, 0x4080u, 0x35E2u, 0xF47Bu, 0x8119u, 0x1EBFu, 0x6BDDu,
+ 0x8850u, 0xFD32u, 0x6294u, 0x17F6u, 0xD66Fu, 0xA30Du, 0x3CABu, 0x49C9u,
+ 0x342Eu, 0x414Cu, 0xDEEAu, 0xAB88u, 0x6A11u, 0x1F73u, 0x80D5u, 0xF5B7u,
+ 0x7B1Bu, 0x0E79u, 0x91DFu, 0xE4BDu, 0x2524u, 0x5046u, 0xCFE0u, 0xBA82u,
+ 0xC765u, 0xB207u, 0x2DA1u, 0x58C3u, 0x995Au, 0xEC38u, 0x739Eu, 0x06FCu
+ },
+ {
+ 0x0000u, 0x7E66u, 0xFCCCu, 0x82AAu, 0x722Fu, 0x0C49u, 0x8EE3u, 0xF085u,
+ 0xE45Eu, 0x9A38u, 0x1892u, 0x66F4u, 0x9671u, 0xE817u, 0x6ABDu, 0x14DBu,
+ 0x430Bu, 0x3D6Du, 0xBFC7u, 0xC1A1u, 0x3124u, 0x4F42u, 0xCDE8u, 0xB38Eu,
+ 0xA755u, 0xD933u, 0x5B99u, 0x25FFu, 0xD57Au, 0xAB1Cu, 0x29B6u, 0x57D0u,
+ 0x8616u, 0xF870u, 0x7ADAu, 0x04BCu, 0xF439u, 0x8A5Fu, 0x08F5u, 0x7693u,
+ 0x6248u, 0x1C2Eu, 0x9E84u, 0xE0E2u, 0x1067u, 0x6E01u, 0xECABu, 0x92CDu,
+ 0xC51Du, 0xBB7Bu, 0x39D1u, 0x47B7u, 0xB732u, 0xC954u, 0x4BFEu, 0x3598u,
+ 0x2143u, 0x5F25u, 0xDD8Fu, 0xA3E9u, 0x536Cu, 0x2D0Au, 0xAFA0u, 0xD1C6u,
+ 0x879Bu, 0xF9FDu, 0x7B57u, 0x0531u, 0xF5B4u, 0x8BD2u, 0x0978u, 0x771Eu,
+ 0x63C5u, 0x1DA3u, 0x9F09u, 0xE16Fu, 0x11EAu, 0x6F8Cu, 0xED26u, 0x9340u,
+ 0xC490u, 0xBAF6u, 0x385Cu, 0x463Au, 0xB6BFu, 0xC8D9u, 0x4A73u, 0x3415u,
+ 0x20CEu, 0x5EA8u, 0xDC02u, 0xA264u, 0x52E1u, 0x2C87u, 0xAE2Du, 0xD04Bu,
+ 0x018Du, 0x7FEBu, 0xFD41u, 0x8327u, 0x73A2u, 0x0DC4u, 0x8F6Eu, 0xF108u,
+ 0xE5D3u, 0x9BB5u, 0x191Fu, 0x6779u, 0x97FCu, 0xE99Au, 0x6B30u, 0x1556u,
+ 0x4286u, 0x3CE0u, 0xBE4Au, 0xC02Cu, 0x30A9u, 0x4ECFu, 0xCC65u, 0xB203u,
+ 0xA6D8u, 0xD8BEu, 0x5A14u, 0x2472u, 0xD4F7u, 0xAA91u, 0x283Bu, 0x565Du,
+ 0x8481u, 0xFAE7u, 0x784Du, 0x062Bu, 0xF6AEu, 0x88C8u, 0x0A62u, 0x7404u,
+ 0x60DFu, 0x1EB9u, 0x9C13u, 0xE275u, 0x12F0u, 0x6C96u, 0xEE3Cu, 0x905Au,
+ 0xC78Au, 0xB9ECu, 0x3B46u, 0x4520u, 0xB5A5u, 0xCBC3u, 0x4969u, 0x370Fu,
+ 0x23D4u, 0x5DB2u, 0xDF18u, 0xA17Eu, 0x51FBu, 0x2F9Du, 0xAD37u, 0xD351u,
+ 0x0297u, 0x7CF1u, 0xFE5Bu, 0x803Du, 0x70B8u, 0x0EDEu, 0x8C74u, 0xF212u,
+ 0xE6C9u, 0x98AFu, 0x1A05u, 0x6463u, 0x94E6u, 0xEA80u, 0x682Au, 0x164Cu,
+ 0x419Cu, 0x3FFAu, 0xBD50u, 0xC336u, 0x33B3u, 0x4DD5u, 0xCF7Fu, 0xB119u,
+ 0xA5C2u, 0xDBA4u, 0x590Eu, 0x2768u, 0xD7EDu, 0xA98Bu, 0x2B21u, 0x5547u,
+ 0x031Au, 0x7D7Cu, 0xFFD6u, 0x81B0u, 0x7135u, 0x0F53u, 0x8DF9u, 0xF39Fu,
+ 0xE744u, 0x9922u, 0x1B88u, 0x65EEu, 0x956Bu, 0xEB0Du, 0x69A7u, 0x17C1u,
+ 0x4011u, 0x3E77u, 0xBCDDu, 0xC2BBu, 0x323Eu, 0x4C58u, 0xCEF2u, 0xB094u,
+ 0xA44Fu, 0xDA29u, 0x5883u, 0x26E5u, 0xD660u, 0xA806u, 0x2AACu, 0x54CAu,
+ 0x850Cu, 0xFB6Au, 0x79C0u, 0x07A6u, 0xF723u, 0x8945u, 0x0BEFu, 0x7589u,
+ 0x6152u, 0x1F34u, 0x9D9Eu, 0xE3F8u, 0x137Du, 0x6D1Bu, 0xEFB1u, 0x91D7u,
+ 0xC607u, 0xB861u, 0x3ACBu, 0x44ADu, 0xB428u, 0xCA4Eu, 0x48E4u, 0x3682u,
+ 0x2259u, 0x5C3Fu, 0xDE95u, 0xA0F3u, 0x5076u, 0x2E10u, 0xACBAu, 0xD2DCu
+ },
+ {
+ 0x0000u, 0x82B5u, 0x8EDDu, 0x0C68u, 0x960Du, 0x14B8u, 0x18D0u, 0x9A65u,
+ 0xA7ADu, 0x2518u, 0x2970u, 0xABC5u, 0x31A0u, 0xB315u, 0xBF7Du, 0x3DC8u,
+ 0xC4EDu, 0x4658u, 0x4A30u, 0xC885u, 0x52E0u, 0xD055u, 0xDC3Du, 0x5E88u,
+ 0x6340u, 0xE1F5u, 0xED9Du, 0x6F28u, 0xF54Du, 0x77F8u, 0x7B90u, 0xF925u,
+ 0x026Du, 0x80D8u, 0x8CB0u, 0x0E05u, 0x9460u, 0x16D5u, 0x1ABDu, 0x9808u,
+ 0xA5C0u, 0x2775u, 0x2B1Du, 0xA9A8u, 0x33CDu, 0xB178u, 0xBD10u, 0x3FA5u,
+ 0xC680u, 0x4435u, 0x485Du, 0xCAE8u, 0x508Du, 0xD238u, 0xDE50u, 0x5CE5u,
+ 0x612Du, 0xE398u, 0xEFF0u, 0x6D45u, 0xF720u, 0x7595u, 0x79FDu, 0xFB48u,
+ 0x04DAu, 0x866Fu, 0x8A07u, 0x08B2u, 0x92D7u, 0x1062u, 0x1C0Au, 0x9EBFu,
+ 0xA377u, 0x21C2u, 0x2DAAu, 0xAF1Fu, 0x357Au, 0xB7CFu, 0xBBA7u, 0x3912u,
+ 0xC037u, 0x4282u, 0x4EEAu, 0xCC5Fu, 0x563Au, 0xD48Fu, 0xD8E7u, 0x5A52u,
+ 0x679Au, 0xE52Fu, 0xE947u, 0x6BF2u, 0xF197u, 0x7322u, 0x7F4Au, 0xFDFFu,
+ 0x06B7u, 0x8402u, 0x886Au, 0x0ADFu, 0x90BAu, 0x120Fu, 0x1E67u, 0x9CD2u,
+ 0xA11Au, 0x23AFu, 0x2FC7u, 0xAD72u, 0x3717u, 0xB5A2u, 0xB9CAu, 0x3B7Fu,
+ 0xC25Au, 0x40EFu, 0x4C87u, 0xCE32u, 0x5457u, 0xD6E2u, 0xDA8Au, 0x583Fu,
+ 0x65F7u, 0xE742u, 0xEB2Au, 0x699Fu, 0xF3FAu, 0x714Fu, 0x7D27u, 0xFF92u,
+ 0x09B4u, 0x8B01u, 0x8769u, 0x05DCu, 0x9FB9u, 0x1D0Cu, 0x1164u, 0x93D1u,
+ 0xAE19u, 0x2CACu, 0x20C4u, 0xA271u, 0x3814u, 0xBAA1u, 0xB6C9u, 0x347Cu,
+ 0xCD59u, 0x4FECu, 0x4384u, 0xC131u, 0x5B54u, 0xD9E1u, 0xD589u, 0x573Cu,
+ 0x6AF4u, 0xE841u, 0xE429u, 0x669Cu, 0xFCF9u, 0x7E4Cu, 0x7224u, 0xF091u,
+ 0x0BD9u, 0x896Cu, 0x8504u, 0x07B1u, 0x9DD4u, 0x1F61u, 0x1309u, 0x91BCu,
+ 0xAC74u, 0x2EC1u, 0x22A9u, 0xA01Cu, 0x3A79u, 0xB8CCu, 0xB4A4u, 0x3611u,
+ 0xCF34u, 0x4D81u, 0x41E9u, 0xC35Cu, 0x5939u, 0xDB8Cu, 0xD7E4u, 0x5551u,
+ 0x6899u, 0xEA2Cu, 0xE644u, 0x64F1u, 0xFE94u, 0x7C21u, 0x7049u, 0xF2FCu,
+ 0x0D6Eu, 0x8FDBu, 0x83B3u, 0x0106u, 0x9B63u, 0x19D6u, 0x15BEu, 0x970Bu,
+ 0xAAC3u, 0x2876u, 0x241Eu, 0xA6ABu, 0x3CCEu, 0xBE7Bu, 0xB213u, 0x30A6u,
+ 0xC983u, 0x4B36u, 0x475Eu, 0xC5EBu, 0x5F8Eu, 0xDD3Bu, 0xD153u, 0x53E6u,
+ 0x6E2Eu, 0xEC9Bu, 0xE0F3u, 0x6246u, 0xF823u, 0x7A96u, 0x76FEu, 0xF44Bu,
+ 0x0F03u, 0x8DB6u, 0x81DEu, 0x036Bu, 0x990Eu, 0x1BBBu, 0x17D3u, 0x9566u,
+ 0xA8AEu, 0x2A1Bu, 0x2673u, 0xA4C6u, 0x3EA3u, 0xBC16u, 0xB07Eu, 0x32CBu,
+ 0xCBEEu, 0x495Bu, 0x4533u, 0xC786u, 0x5DE3u, 0xDF56u, 0xD33Eu, 0x518Bu,
+ 0x6C43u, 0xEEF6u, 0xE29Eu, 0x602Bu, 0xFA4Eu, 0x78FBu, 0x7493u, 0xF626u
+ },
+ {
+ 0x0000u, 0x1368u, 0x26D0u, 0x35B8u, 0x4DA0u, 0x5EC8u, 0x6B70u, 0x7818u,
+ 0x9B40u, 0x8828u, 0xBD90u, 0xAEF8u, 0xD6E0u, 0xC588u, 0xF030u, 0xE358u,
+ 0xBD37u, 0xAE5Fu, 0x9BE7u, 0x888Fu, 0xF097u, 0xE3FFu, 0xD647u, 0xC52Fu,
+ 0x2677u, 0x351Fu, 0x00A7u, 0x13CFu, 0x6BD7u, 0x78BFu, 0x4D07u, 0x5E6Fu,
+ 0xF1D9u, 0xE2B1u, 0xD709u, 0xC461u, 0xBC79u, 0xAF11u, 0x9AA9u, 0x89C1u,
+ 0x6A99u, 0x79F1u, 0x4C49u, 0x5F21u, 0x2739u, 0x3451u, 0x01E9u, 0x1281u,
+ 0x4CEEu, 0x5F86u, 0x6A3Eu, 0x7956u, 0x014Eu, 0x1226u, 0x279Eu, 0x34F6u,
+ 0xD7AEu, 0xC4C6u, 0xF17Eu, 0xE216u, 0x9A0Eu, 0x8966u, 0xBCDEu, 0xAFB6u,
+ 0x6805u, 0x7B6Du, 0x4ED5u, 0x5DBDu, 0x25A5u, 0x36CDu, 0x0375u, 0x101Du,
+ 0xF345u, 0xE02Du, 0xD595u, 0xC6FDu, 0xBEE5u, 0xAD8Du, 0x9835u, 0x8B5Du,
+ 0xD532u, 0xC65Au, 0xF3E2u, 0xE08Au, 0x9892u, 0x8BFAu, 0xBE42u, 0xAD2Au,
+ 0x4E72u, 0x5D1Au, 0x68A2u, 0x7BCAu, 0x03D2u, 0x10BAu, 0x2502u, 0x366Au,
+ 0x99DCu, 0x8AB4u, 0xBF0Cu, 0xAC64u, 0xD47Cu, 0xC714u, 0xF2ACu, 0xE1C4u,
+ 0x029Cu, 0x11F4u, 0x244Cu, 0x3724u, 0x4F3Cu, 0x5C54u, 0x69ECu, 0x7A84u,
+ 0x24EBu, 0x3783u, 0x023Bu, 0x1153u, 0x694Bu, 0x7A23u, 0x4F9Bu, 0x5CF3u,
+ 0xBFABu, 0xACC3u, 0x997Bu, 0x8A13u, 0xF20Bu, 0xE163u, 0xD4DBu, 0xC7B3u,
+ 0xD00Au, 0xC362u, 0xF6DAu, 0xE5B2u, 0x9DAAu, 0x8EC2u, 0xBB7Au, 0xA812u,
+ 0x4B4Au, 0x5822u, 0x6D9Au, 0x7EF2u, 0x06EAu, 0x1582u, 0x203Au, 0x3352u,
+ 0x6D3Du, 0x7E55u, 0x4BEDu, 0x5885u, 0x209Du, 0x33F5u, 0x064Du, 0x1525u,
+ 0xF67Du, 0xE515u, 0xD0ADu, 0xC3C5u, 0xBBDDu, 0xA8B5u, 0x9D0Du, 0x8E65u,
+ 0x21D3u, 0x32BBu, 0x0703u, 0x146Bu, 0x6C73u, 0x7F1Bu, 0x4AA3u, 0x59CBu,
+ 0xBA93u, 0xA9FBu, 0x9C43u, 0x8F2Bu, 0xF733u, 0xE45Bu, 0xD1E3u, 0xC28Bu,
+ 0x9CE4u, 0x8F8Cu, 0xBA34u, 0xA95Cu, 0xD144u, 0xC22Cu, 0xF794u, 0xE4FCu,
+ 0x07A4u, 0x14CCu, 0x2174u, 0x321Cu, 0x4A04u, 0x596Cu, 0x6CD4u, 0x7FBCu,
+ 0xB80Fu, 0xAB67u, 0x9EDFu, 0x8DB7u, 0xF5AFu, 0xE6C7u, 0xD37Fu, 0xC017u,
+ 0x234Fu, 0x3027u, 0x059Fu, 0x16F7u, 0x6EEFu, 0x7D87u, 0x483Fu, 0x5B57u,
+ 0x0538u, 0x1650u, 0x23E8u, 0x3080u, 0x4898u, 0x5BF0u, 0x6E48u, 0x7D20u,
+ 0x9E78u, 0x8D10u, 0xB8A8u, 0xABC0u, 0xD3D8u, 0xC0B0u, 0xF508u, 0xE660u,
+ 0x49D6u, 0x5ABEu, 0x6F06u, 0x7C6Eu, 0x0476u, 0x171Eu, 0x22A6u, 0x31CEu,
+ 0xD296u, 0xC1FEu, 0xF446u, 0xE72Eu, 0x9F36u, 0x8C5Eu, 0xB9E6u, 0xAA8Eu,
+ 0xF4E1u, 0xE789u, 0xD231u, 0xC159u, 0xB941u, 0xAA29u, 0x9F91u, 0x8CF9u,
+ 0x6FA1u, 0x7CC9u, 0x4971u, 0x5A19u, 0x2201u, 0x3169u, 0x04D1u, 0x17B9u
+ },
+ {
+ 0x0000u, 0x2BA3u, 0x5746u, 0x7CE5u, 0xAE8Cu, 0x852Fu, 0xF9CAu, 0xD269u,
+ 0xD6AFu, 0xFD0Cu, 0x81E9u, 0xAA4Au, 0x7823u, 0x5380u, 0x2F65u, 0x04C6u,
+ 0x26E9u, 0x0D4Au, 0x71AFu, 0x5A0Cu, 0x8865u, 0xA3C6u, 0xDF23u, 0xF480u,
+ 0xF046u, 0xDBE5u, 0xA700u, 0x8CA3u, 0x5ECAu, 0x7569u, 0x098Cu, 0x222Fu,
+ 0x4DD2u, 0x6671u, 0x1A94u, 0x3137u, 0xE35Eu, 0xC8FDu, 0xB418u, 0x9FBBu,
+ 0x9B7Du, 0xB0DEu, 0xCC3Bu, 0xE798u, 0x35F1u, 0x1E52u, 0x62B7u, 0x4914u,
+ 0x6B3Bu, 0x4098u, 0x3C7Du, 0x17DEu, 0xC5B7u, 0xEE14u, 0x92F1u, 0xB952u,
+ 0xBD94u, 0x9637u, 0xEAD2u, 0xC171u, 0x1318u, 0x38BBu, 0x445Eu, 0x6FFDu,
+ 0x9BA4u, 0xB007u, 0xCCE2u, 0xE741u, 0x3528u, 0x1E8Bu, 0x626Eu, 0x49CDu,
+ 0x4D0Bu, 0x66A8u, 0x1A4Du, 0x31EEu, 0xE387u, 0xC824u, 0xB4C1u, 0x9F62u,
+ 0xBD4Du, 0x96EEu, 0xEA0Bu, 0xC1A8u, 0x13C1u, 0x3862u, 0x4487u, 0x6F24u,
+ 0x6BE2u, 0x4041u, 0x3CA4u, 0x1707u, 0xC56Eu, 0xEECDu, 0x9228u, 0xB98Bu,
+ 0xD676u, 0xFDD5u, 0x8130u, 0xAA93u, 0x78FAu, 0x5359u, 0x2FBCu, 0x041Fu,
+ 0x00D9u, 0x2B7Au, 0x579Fu, 0x7C3Cu, 0xAE55u, 0x85F6u, 0xF913u, 0xD2B0u,
+ 0xF09Fu, 0xDB3Cu, 0xA7D9u, 0x8C7Au, 0x5E13u, 0x75B0u, 0x0955u, 0x22F6u,
+ 0x2630u, 0x0D93u, 0x7176u, 0x5AD5u, 0x88BCu, 0xA31Fu, 0xDFFAu, 0xF459u,
+ 0xBCFFu, 0x975Cu, 0xEBB9u, 0xC01Au, 0x1273u, 0x39D0u, 0x4535u, 0x6E96u,
+ 0x6A50u, 0x41F3u, 0x3D16u, 0x16B5u, 0xC4DCu, 0xEF7Fu, 0x939Au, 0xB839u,
+ 0x9A16u, 0xB1B5u, 0xCD50u, 0xE6F3u, 0x349Au, 0x1F39u, 0x63DCu, 0x487Fu,
+ 0x4CB9u, 0x671Au, 0x1BFFu, 0x305Cu, 0xE235u, 0xC996u, 0xB573u, 0x9ED0u,
+ 0xF12Du, 0xDA8Eu, 0xA66Bu, 0x8DC8u, 0x5FA1u, 0x7402u, 0x08E7u, 0x2344u,
+ 0x2782u, 0x0C21u, 0x70C4u, 0x5B67u, 0x890Eu, 0xA2ADu, 0xDE48u, 0xF5EBu,
+ 0xD7C4u, 0xFC67u, 0x8082u, 0xAB21u, 0x7948u, 0x52EBu, 0x2E0Eu, 0x05ADu,
+ 0x016Bu, 0x2AC8u, 0x562Du, 0x7D8Eu, 0xAFE7u, 0x8444u, 0xF8A1u, 0xD302u,
+ 0x275Bu, 0x0CF8u, 0x701Du, 0x5BBEu, 0x89D7u, 0xA274u, 0xDE91u, 0xF532u,
+ 0xF1F4u, 0xDA57u, 0xA6B2u, 0x8D11u, 0x5F78u, 0x74DBu, 0x083Eu, 0x239Du,
+ 0x01B2u, 0x2A11u, 0x56F4u, 0x7D57u, 0xAF3Eu, 0x849Du, 0xF878u, 0xD3DBu,
+ 0xD71Du, 0xFCBEu, 0x805Bu, 0xABF8u, 0x7991u, 0x5232u, 0x2ED7u, 0x0574u,
+ 0x6A89u, 0x412Au, 0x3DCFu, 0x166Cu, 0xC405u, 0xEFA6u, 0x9343u, 0xB8E0u,
+ 0xBC26u, 0x9785u, 0xEB60u, 0xC0C3u, 0x12AAu, 0x3909u, 0x45ECu, 0x6E4Fu,
+ 0x4C60u, 0x67C3u, 0x1B26u, 0x3085u, 0xE2ECu, 0xC94Fu, 0xB5AAu, 0x9E09u,
+ 0x9ACFu, 0xB16Cu, 0xCD89u, 0xE62Au, 0x3443u, 0x1FE0u, 0x6305u, 0x48A6u
+ },
+ {
+ 0x0000u, 0xF249u, 0x6F25u, 0x9D6Cu, 0xDE4Au, 0x2C03u, 0xB16Fu, 0x4326u,
+ 0x3723u, 0xC56Au, 0x5806u, 0xAA4Fu, 0xE969u, 0x1B20u, 0x864Cu, 0x7405u,
+ 0x6E46u, 0x9C0Fu, 0x0163u, 0xF32Au, 0xB00Cu, 0x4245u, 0xDF29u, 0x2D60u,
+ 0x5965u, 0xAB2Cu, 0x3640u, 0xC409u, 0x872Fu, 0x7566u, 0xE80Au, 0x1A43u,
+ 0xDC8Cu, 0x2EC5u, 0xB3A9u, 0x41E0u, 0x02C6u, 0xF08Fu, 0x6DE3u, 0x9FAAu,
+ 0xEBAFu, 0x19E6u, 0x848Au, 0x76C3u, 0x35E5u, 0xC7ACu, 0x5AC0u, 0xA889u,
+ 0xB2CAu, 0x4083u, 0xDDEFu, 0x2FA6u, 0x6C80u, 0x9EC9u, 0x03A5u, 0xF1ECu,
+ 0x85E9u, 0x77A0u, 0xEACCu, 0x1885u, 0x5BA3u, 0xA9EAu, 0x3486u, 0xC6CFu,
+ 0x32AFu, 0xC0E6u, 0x5D8Au, 0xAFC3u, 0xECE5u, 0x1EACu, 0x83C0u, 0x7189u,
+ 0x058Cu, 0xF7C5u, 0x6AA9u, 0x98E0u, 0xDBC6u, 0x298Fu, 0xB4E3u, 0x46AAu,
+ 0x5CE9u, 0xAEA0u, 0x33CCu, 0xC185u, 0x82A3u, 0x70EAu, 0xED86u, 0x1FCFu,
+ 0x6BCAu, 0x9983u, 0x04EFu, 0xF6A6u, 0xB580u, 0x47C9u, 0xDAA5u, 0x28ECu,
+ 0xEE23u, 0x1C6Au, 0x8106u, 0x734Fu, 0x3069u, 0xC220u, 0x5F4Cu, 0xAD05u,
+ 0xD900u, 0x2B49u, 0xB625u, 0x446Cu, 0x074Au, 0xF503u, 0x686Fu, 0x9A26u,
+ 0x8065u, 0x722Cu, 0xEF40u, 0x1D09u, 0x5E2Fu, 0xAC66u, 0x310Au, 0xC343u,
+ 0xB746u, 0x450Fu, 0xD863u, 0x2A2Au, 0x690Cu, 0x9B45u, 0x0629u, 0xF460u,
+ 0x655Eu, 0x9717u, 0x0A7Bu, 0xF832u, 0xBB14u, 0x495Du, 0xD431u, 0x2678u,
+ 0x527Du, 0xA034u, 0x3D58u, 0xCF11u, 0x8C37u, 0x7E7Eu, 0xE312u, 0x115Bu,
+ 0x0B18u, 0xF951u, 0x643Du, 0x9674u, 0xD552u, 0x271Bu, 0xBA77u, 0x483Eu,
+ 0x3C3Bu, 0xCE72u, 0x531Eu, 0xA157u, 0xE271u, 0x1038u, 0x8D54u, 0x7F1Du,
+ 0xB9D2u, 0x4B9Bu, 0xD6F7u, 0x24BEu, 0x6798u, 0x95D1u, 0x08BDu, 0xFAF4u,
+ 0x8EF1u, 0x7CB8u, 0xE1D4u, 0x139Du, 0x50BBu, 0xA2F2u, 0x3F9Eu, 0xCDD7u,
+ 0xD794u, 0x25DDu, 0xB8B1u, 0x4AF8u, 0x09DEu, 0xFB97u, 0x66FBu, 0x94B2u,
+ 0xE0B7u, 0x12FEu, 0x8F92u, 0x7DDBu, 0x3EFDu, 0xCCB4u, 0x51D8u, 0xA391u,
+ 0x57F1u, 0xA5B8u, 0x38D4u, 0xCA9Du, 0x89BBu, 0x7BF2u, 0xE69Eu, 0x14D7u,
+ 0x60D2u, 0x929Bu, 0x0FF7u, 0xFDBEu, 0xBE98u, 0x4CD1u, 0xD1BDu, 0x23F4u,
+ 0x39B7u, 0xCBFEu, 0x5692u, 0xA4DBu, 0xE7FDu, 0x15B4u, 0x88D8u, 0x7A91u,
+ 0x0E94u, 0xFCDDu, 0x61B1u, 0x93F8u, 0xD0DEu, 0x2297u, 0xBFFBu, 0x4DB2u,
+ 0x8B7Du, 0x7934u, 0xE458u, 0x1611u, 0x5537u, 0xA77Eu, 0x3A12u, 0xC85Bu,
+ 0xBC5Eu, 0x4E17u, 0xD37Bu, 0x2132u, 0x6214u, 0x905Du, 0x0D31u, 0xFF78u,
+ 0xE53Bu, 0x1772u, 0x8A1Eu, 0x7857u, 0x3B71u, 0xC938u, 0x5454u, 0xA61Du,
+ 0xD218u, 0x2051u, 0xBD3Du, 0x4F74u, 0x0C52u, 0xFE1Bu, 0x6377u, 0x913Eu
+ },
+ {
+ 0x0000u, 0xCABCu, 0x1ECFu, 0xD473u, 0x3D9Eu, 0xF722u, 0x2351u, 0xE9EDu,
+ 0x7B3Cu, 0xB180u, 0x65F3u, 0xAF4Fu, 0x46A2u, 0x8C1Eu, 0x586Du, 0x92D1u,
+ 0xF678u, 0x3CC4u, 0xE8B7u, 0x220Bu, 0xCBE6u, 0x015Au, 0xD529u, 0x1F95u,
+ 0x8D44u, 0x47F8u, 0x938Bu, 0x5937u, 0xB0DAu, 0x7A66u, 0xAE15u, 0x64A9u,
+ 0x6747u, 0xADFBu, 0x7988u, 0xB334u, 0x5AD9u, 0x9065u, 0x4416u, 0x8EAAu,
+ 0x1C7Bu, 0xD6C7u, 0x02B4u, 0xC808u, 0x21E5u, 0xEB59u, 0x3F2Au, 0xF596u,
+ 0x913Fu, 0x5B83u, 0x8FF0u, 0x454Cu, 0xACA1u, 0x661Du, 0xB26Eu, 0x78D2u,
+ 0xEA03u, 0x20BFu, 0xF4CCu, 0x3E70u, 0xD79Du, 0x1D21u, 0xC952u, 0x03EEu,
+ 0xCE8Eu, 0x0432u, 0xD041u, 0x1AFDu, 0xF310u, 0x39ACu, 0xEDDFu, 0x2763u,
+ 0xB5B2u, 0x7F0Eu, 0xAB7Du, 0x61C1u, 0x882Cu, 0x4290u, 0x96E3u, 0x5C5Fu,
+ 0x38F6u, 0xF24Au, 0x2639u, 0xEC85u, 0x0568u, 0xCFD4u, 0x1BA7u, 0xD11Bu,
+ 0x43CAu, 0x8976u, 0x5D05u, 0x97B9u, 0x7E54u, 0xB4E8u, 0x609Bu, 0xAA27u,
+ 0xA9C9u, 0x6375u, 0xB706u, 0x7DBAu, 0x9457u, 0x5EEBu, 0x8A98u, 0x4024u,
+ 0xD2F5u, 0x1849u, 0xCC3Au, 0x0686u, 0xEF6Bu, 0x25D7u, 0xF1A4u, 0x3B18u,
+ 0x5FB1u, 0x950Du, 0x417Eu, 0x8BC2u, 0x622Fu, 0xA893u, 0x7CE0u, 0xB65Cu,
+ 0x248Du, 0xEE31u, 0x3A42u, 0xF0FEu, 0x1913u, 0xD3AFu, 0x07DCu, 0xCD60u,
+ 0x16ABu, 0xDC17u, 0x0864u, 0xC2D8u, 0x2B35u, 0xE189u, 0x35FAu, 0xFF46u,
+ 0x6D97u, 0xA72Bu, 0x7358u, 0xB9E4u, 0x5009u, 0x9AB5u, 0x4EC6u, 0x847Au,
+ 0xE0D3u, 0x2A6Fu, 0xFE1Cu, 0x34A0u, 0xDD4Du, 0x17F1u, 0xC382u, 0x093Eu,
+ 0x9BEFu, 0x5153u, 0x8520u, 0x4F9Cu, 0xA671u, 0x6CCDu, 0xB8BEu, 0x7202u,
+ 0x71ECu, 0xBB50u, 0x6F23u, 0xA59Fu, 0x4C72u, 0x86CEu, 0x52BDu, 0x9801u,
+ 0x0AD0u, 0xC06Cu, 0x141Fu, 0xDEA3u, 0x374Eu, 0xFDF2u, 0x2981u, 0xE33Du,
+ 0x8794u, 0x4D28u, 0x995Bu, 0x53E7u, 0xBA0Au, 0x70B6u, 0xA4C5u, 0x6E79u,
+ 0xFCA8u, 0x3614u, 0xE267u, 0x28DBu, 0xC136u, 0x0B8Au, 0xDFF9u, 0x1545u,
+ 0xD825u, 0x1299u, 0xC6EAu, 0x0C56u, 0xE5BBu, 0x2F07u, 0xFB74u, 0x31C8u,
+ 0xA319u, 0x69A5u, 0xBDD6u, 0x776Au, 0x9E87u, 0x543Bu, 0x8048u, 0x4AF4u,
+ 0x2E5Du, 0xE4E1u, 0x3092u, 0xFA2Eu, 0x13C3u, 0xD97Fu, 0x0D0Cu, 0xC7B0u,
+ 0x5561u, 0x9FDDu, 0x4BAEu, 0x8112u, 0x68FFu, 0xA243u, 0x7630u, 0xBC8Cu,
+ 0xBF62u, 0x75DEu, 0xA1ADu, 0x6B11u, 0x82FCu, 0x4840u, 0x9C33u, 0x568Fu,
+ 0xC45Eu, 0x0EE2u, 0xDA91u, 0x102Du, 0xF9C0u, 0x337Cu, 0xE70Fu, 0x2DB3u,
+ 0x491Au, 0x83A6u, 0x57D5u, 0x9D69u, 0x7484u, 0xBE38u, 0x6A4Bu, 0xA0F7u,
+ 0x3226u, 0xF89Au, 0x2CE9u, 0xE655u, 0x0FB8u, 0xC504u, 0x1177u, 0xDBCBu
+ },
+ {
+ 0x0000u, 0x2D56u, 0x5AACu, 0x77FAu, 0xB558u, 0x980Eu, 0xEFF4u, 0xC2A2u,
+ 0xE107u, 0xCC51u, 0xBBABu, 0x96FDu, 0x545Fu, 0x7909u, 0x0EF3u, 0x23A5u,
+ 0x49B9u, 0x64EFu, 0x1315u, 0x3E43u, 0xFCE1u, 0xD1B7u, 0xA64Du, 0x8B1Bu,
+ 0xA8BEu, 0x85E8u, 0xF212u, 0xDF44u, 0x1DE6u, 0x30B0u, 0x474Au, 0x6A1Cu,
+ 0x9372u, 0xBE24u, 0xC9DEu, 0xE488u, 0x262Au, 0x0B7Cu, 0x7C86u, 0x51D0u,
+ 0x7275u, 0x5F23u, 0x28D9u, 0x058Fu, 0xC72Du, 0xEA7Bu, 0x9D81u, 0xB0D7u,
+ 0xDACBu, 0xF79Du, 0x8067u, 0xAD31u, 0x6F93u, 0x42C5u, 0x353Fu, 0x1869u,
+ 0x3BCCu, 0x169Au, 0x6160u, 0x4C36u, 0x8E94u, 0xA3C2u, 0xD438u, 0xF96Eu,
+ 0xAD53u, 0x8005u, 0xF7FFu, 0xDAA9u, 0x180Bu, 0x355Du, 0x42A7u, 0x6FF1u,
+ 0x4C54u, 0x6102u, 0x16F8u, 0x3BAEu, 0xF90Cu, 0xD45Au, 0xA3A0u, 0x8EF6u,
+ 0xE4EAu, 0xC9BCu, 0xBE46u, 0x9310u, 0x51B2u, 0x7CE4u, 0x0B1Eu, 0x2648u,
+ 0x05EDu, 0x28BBu, 0x5F41u, 0x7217u, 0xB0B5u, 0x9DE3u, 0xEA19u, 0xC74Fu,
+ 0x3E21u, 0x1377u, 0x648Du, 0x49DBu, 0x8B79u, 0xA62Fu, 0xD1D5u, 0xFC83u,
+ 0xDF26u, 0xF270u, 0x858Au, 0xA8DCu, 0x6A7Eu, 0x4728u, 0x30D2u, 0x1D84u,
+ 0x7798u, 0x5ACEu, 0x2D34u, 0x0062u, 0xC2C0u, 0xEF96u, 0x986Cu, 0xB53Au,
+ 0x969Fu, 0xBBC9u, 0xCC33u, 0xE165u, 0x23C7u, 0x0E91u, 0x796Bu, 0x543Du,
+ 0xD111u, 0xFC47u, 0x8BBDu, 0xA6EBu, 0x6449u, 0x491Fu, 0x3EE5u, 0x13B3u,
+ 0x3016u, 0x1D40u, 0x6ABAu, 0x47ECu, 0x854Eu, 0xA818u, 0xDFE2u, 0xF2B4u,
+ 0x98A8u, 0xB5FEu, 0xC204u, 0xEF52u, 0x2DF0u, 0x00A6u, 0x775Cu, 0x5A0Au,
+ 0x79AFu, 0x54F9u, 0x2303u, 0x0E55u, 0xCCF7u, 0xE1A1u, 0x965Bu, 0xBB0Du,
+ 0x4263u, 0x6F35u, 0x18CFu, 0x3599u, 0xF73Bu, 0xDA6Du, 0xAD97u, 0x80C1u,
+ 0xA364u, 0x8E32u, 0xF9C8u, 0xD49Eu, 0x163Cu, 0x3B6Au, 0x4C90u, 0x61C6u,
+ 0x0BDAu, 0x268Cu, 0x5176u, 0x7C20u, 0xBE82u, 0x93D4u, 0xE42Eu, 0xC978u,
+ 0xEADDu, 0xC78Bu, 0xB071u, 0x9D27u, 0x5F85u, 0x72D3u, 0x0529u, 0x287Fu,
+ 0x7C42u, 0x5114u, 0x26EEu, 0x0BB8u, 0xC91Au, 0xE44Cu, 0x93B6u, 0xBEE0u,
+ 0x9D45u, 0xB013u, 0xC7E9u, 0xEABFu, 0x281Du, 0x054Bu, 0x72B1u, 0x5FE7u,
+ 0x35FBu, 0x18ADu, 0x6F57u, 0x4201u, 0x80A3u, 0xADF5u, 0xDA0Fu, 0xF759u,
+ 0xD4FCu, 0xF9AAu, 0x8E50u, 0xA306u, 0x61A4u, 0x4CF2u, 0x3B08u, 0x165Eu,
+ 0xEF30u, 0xC266u, 0xB59Cu, 0x98CAu, 0x5A68u, 0x773Eu, 0x00C4u, 0x2D92u,
+ 0x0E37u, 0x2361u, 0x549Bu, 0x79CDu, 0xBB6Fu, 0x9639u, 0xE1C3u, 0xCC95u,
+ 0xA689u, 0x8BDFu, 0xFC25u, 0xD173u, 0x13D1u, 0x3E87u, 0x497Du, 0x642Bu,
+ 0x478Eu, 0x6AD8u, 0x1D22u, 0x3074u, 0xF2D6u, 0xDF80u, 0xA87Au, 0x852Cu
+ },
+ {
+ 0x0000u, 0x2995u, 0x532Au, 0x7ABFu, 0xA654u, 0x8FC1u, 0xF57Eu, 0xDCEBu,
+ 0xC71Fu, 0xEE8Au, 0x9435u, 0xBDA0u, 0x614Bu, 0x48DEu, 0x3261u, 0x1BF4u,
+ 0x0589u, 0x2C1Cu, 0x56A3u, 0x7F36u, 0xA3DDu, 0x8A48u, 0xF0F7u, 0xD962u,
+ 0xC296u, 0xEB03u, 0x91BCu, 0xB829u, 0x64C2u, 0x4D57u, 0x37E8u, 0x1E7Du,
+ 0x0B12u, 0x2287u, 0x5838u, 0x71ADu, 0xAD46u, 0x84D3u, 0xFE6Cu, 0xD7F9u,
+ 0xCC0Du, 0xE598u, 0x9F27u, 0xB6B2u, 0x6A59u, 0x43CCu, 0x3973u, 0x10E6u,
+ 0x0E9Bu, 0x270Eu, 0x5DB1u, 0x7424u, 0xA8CFu, 0x815Au, 0xFBE5u, 0xD270u,
+ 0xC984u, 0xE011u, 0x9AAEu, 0xB33Bu, 0x6FD0u, 0x4645u, 0x3CFAu, 0x156Fu,
+ 0x1624u, 0x3FB1u, 0x450Eu, 0x6C9Bu, 0xB070u, 0x99E5u, 0xE35Au, 0xCACFu,
+ 0xD13Bu, 0xF8AEu, 0x8211u, 0xAB84u, 0x776Fu, 0x5EFAu, 0x2445u, 0x0DD0u,
+ 0x13ADu, 0x3A38u, 0x4087u, 0x6912u, 0xB5F9u, 0x9C6Cu, 0xE6D3u, 0xCF46u,
+ 0xD4B2u, 0xFD27u, 0x8798u, 0xAE0Du, 0x72E6u, 0x5B73u, 0x21CCu, 0x0859u,
+ 0x1D36u, 0x34A3u, 0x4E1Cu, 0x6789u, 0xBB62u, 0x92F7u, 0xE848u, 0xC1DDu,
+ 0xDA29u, 0xF3BCu, 0x8903u, 0xA096u, 0x7C7Du, 0x55E8u, 0x2F57u, 0x06C2u,
+ 0x18BFu, 0x312Au, 0x4B95u, 0x6200u, 0xBEEBu, 0x977Eu, 0xEDC1u, 0xC454u,
+ 0xDFA0u, 0xF635u, 0x8C8Au, 0xA51Fu, 0x79F4u, 0x5061u, 0x2ADEu, 0x034Bu,
+ 0x2C48u, 0x05DDu, 0x7F62u, 0x56F7u, 0x8A1Cu, 0xA389u, 0xD936u, 0xF0A3u,
+ 0xEB57u, 0xC2C2u, 0xB87Du, 0x91E8u, 0x4D03u, 0x6496u, 0x1E29u, 0x37BCu,
+ 0x29C1u, 0x0054u, 0x7AEBu, 0x537Eu, 0x8F95u, 0xA600u, 0xDCBFu, 0xF52Au,
+ 0xEEDEu, 0xC74Bu, 0xBDF4u, 0x9461u, 0x488Au, 0x611Fu, 0x1BA0u, 0x3235u,
+ 0x275Au, 0x0ECFu, 0x7470u, 0x5DE5u, 0x810Eu, 0xA89Bu, 0xD224u, 0xFBB1u,
+ 0xE045u, 0xC9D0u, 0xB36Fu, 0x9AFAu, 0x4611u, 0x6F84u, 0x153Bu, 0x3CAEu,
+ 0x22D3u, 0x0B46u, 0x71F9u, 0x586Cu, 0x8487u, 0xAD12u, 0xD7ADu, 0xFE38u,
+ 0xE5CCu, 0xCC59u, 0xB6E6u, 0x9F73u, 0x4398u, 0x6A0Du, 0x10B2u, 0x3927u,
+ 0x3A6Cu, 0x13F9u, 0x6946u, 0x40D3u, 0x9C38u, 0xB5ADu, 0xCF12u, 0xE687u,
+ 0xFD73u, 0xD4E6u, 0xAE59u, 0x87CCu, 0x5B27u, 0x72B2u, 0x080Du, 0x2198u,
+ 0x3FE5u, 0x1670u, 0x6CCFu, 0x455Au, 0x99B1u, 0xB024u, 0xCA9Bu, 0xE30Eu,
+ 0xF8FAu, 0xD16Fu, 0xABD0u, 0x8245u, 0x5EAEu, 0x773Bu, 0x0D84u, 0x2411u,
+ 0x317Eu, 0x18EBu, 0x6254u, 0x4BC1u, 0x972Au, 0xBEBFu, 0xC400u, 0xED95u,
+ 0xF661u, 0xDFF4u, 0xA54Bu, 0x8CDEu, 0x5035u, 0x79A0u, 0x031Fu, 0x2A8Au,
+ 0x34F7u, 0x1D62u, 0x67DDu, 0x4E48u, 0x92A3u, 0xBB36u, 0xC189u, 0xE81Cu,
+ 0xF3E8u, 0xDA7Du, 0xA0C2u, 0x8957u, 0x55BCu, 0x7C29u, 0x0696u, 0x2F03u
+ },
+ {
+ 0x0000u, 0x5890u, 0xB120u, 0xE9B0u, 0xE9F7u, 0xB167u, 0x58D7u, 0x0047u,
+ 0x5859u, 0x00C9u, 0xE979u, 0xB1E9u, 0xB1AEu, 0xE93Eu, 0x008Eu, 0x581Eu,
+ 0xB0B2u, 0xE822u, 0x0192u, 0x5902u, 0x5945u, 0x01D5u, 0xE865u, 0xB0F5u,
+ 0xE8EBu, 0xB07Bu, 0x59CBu, 0x015Bu, 0x011Cu, 0x598Cu, 0xB03Cu, 0xE8ACu,
+ 0xEAD3u, 0xB243u, 0x5BF3u, 0x0363u, 0x0324u, 0x5BB4u, 0xB204u, 0xEA94u,
+ 0xB28Au, 0xEA1Au, 0x03AAu, 0x5B3Au, 0x5B7Du, 0x03EDu, 0xEA5Du, 0xB2CDu,
+ 0x5A61u, 0x02F1u, 0xEB41u, 0xB3D1u, 0xB396u, 0xEB06u, 0x02B6u, 0x5A26u,
+ 0x0238u, 0x5AA8u, 0xB318u, 0xEB88u, 0xEBCFu, 0xB35Fu, 0x5AEFu, 0x027Fu,
+ 0x5E11u, 0x0681u, 0xEF31u, 0xB7A1u, 0xB7E6u, 0xEF76u, 0x06C6u, 0x5E56u,
+ 0x0648u, 0x5ED8u, 0xB768u, 0xEFF8u, 0xEFBFu, 0xB72Fu, 0x5E9Fu, 0x060Fu,
+ 0xEEA3u, 0xB633u, 0x5F83u, 0x0713u, 0x0754u, 0x5FC4u, 0xB674u, 0xEEE4u,
+ 0xB6FAu, 0xEE6Au, 0x07DAu, 0x5F4Au, 0x5F0Du, 0x079Du, 0xEE2Du, 0xB6BDu,
+ 0xB4C2u, 0xEC52u, 0x05E2u, 0x5D72u, 0x5D35u, 0x05A5u, 0xEC15u, 0xB485u,
+ 0xEC9Bu, 0xB40Bu, 0x5DBBu, 0x052Bu, 0x056Cu, 0x5DFCu, 0xB44Cu, 0xECDCu,
+ 0x0470u, 0x5CE0u, 0xB550u, 0xEDC0u, 0xED87u, 0xB517u, 0x5CA7u, 0x0437u,
+ 0x5C29u, 0x04B9u, 0xED09u, 0xB599u, 0xB5DEu, 0xED4Eu, 0x04FEu, 0x5C6Eu,
+ 0xBC22u, 0xE4B2u, 0x0D02u, 0x5592u, 0x55D5u, 0x0D45u, 0xE4F5u, 0xBC65u,
+ 0xE47Bu, 0xBCEBu, 0x555Bu, 0x0DCBu, 0x0D8Cu, 0x551Cu, 0xBCACu, 0xE43Cu,
+ 0x0C90u, 0x5400u, 0xBDB0u, 0xE520u, 0xE567u, 0xBDF7u, 0x5447u, 0x0CD7u,
+ 0x54C9u, 0x0C59u, 0xE5E9u, 0xBD79u, 0xBD3Eu, 0xE5AEu, 0x0C1Eu, 0x548Eu,
+ 0x56F1u, 0x0E61u, 0xE7D1u, 0xBF41u, 0xBF06u, 0xE796u, 0x0E26u, 0x56B6u,
+ 0x0EA8u, 0x5638u, 0xBF88u, 0xE718u, 0xE75Fu, 0xBFCFu, 0x567Fu, 0x0EEFu,
+ 0xE643u, 0xBED3u, 0x5763u, 0x0FF3u, 0x0FB4u, 0x5724u, 0xBE94u, 0xE604u,
+ 0xBE1Au, 0xE68Au, 0x0F3Au, 0x57AAu, 0x57EDu, 0x0F7Du, 0xE6CDu, 0xBE5Du,
+ 0xE233u, 0xBAA3u, 0x5313u, 0x0B83u, 0x0BC4u, 0x5354u, 0xBAE4u, 0xE274u,
+ 0xBA6Au, 0xE2FAu, 0x0B4Au, 0x53DAu, 0x539Du, 0x0B0Du, 0xE2BDu, 0xBA2Du,
+ 0x5281u, 0x0A11u, 0xE3A1u, 0xBB31u, 0xBB76u, 0xE3E6u, 0x0A56u, 0x52C6u,
+ 0x0AD8u, 0x5248u, 0xBBF8u, 0xE368u, 0xE32Fu, 0xBBBFu, 0x520Fu, 0x0A9Fu,
+ 0x08E0u, 0x5070u, 0xB9C0u, 0xE150u, 0xE117u, 0xB987u, 0x5037u, 0x08A7u,
+ 0x50B9u, 0x0829u, 0xE199u, 0xB909u, 0xB94Eu, 0xE1DEu, 0x086Eu, 0x50FEu,
+ 0xB852u, 0xE0C2u, 0x0972u, 0x51E2u, 0x51A5u, 0x0935u, 0xE085u, 0xB815u,
+ 0xE00Bu, 0xB89Bu, 0x512Bu, 0x09BBu, 0x09FCu, 0x516Cu, 0xB8DCu, 0xE04Cu
+ },
+ {
+ 0x0000u, 0xF3F3u, 0x6C51u, 0x9FA2u, 0xD8A2u, 0x2B51u, 0xB4F3u, 0x4700u,
+ 0x3AF3u, 0xC900u, 0x56A2u, 0xA551u, 0xE251u, 0x11A2u, 0x8E00u, 0x7DF3u,
+ 0x75E6u, 0x8615u, 0x19B7u, 0xEA44u, 0xAD44u, 0x5EB7u, 0xC115u, 0x32E6u,
+ 0x4F15u, 0xBCE6u, 0x2344u, 0xD0B7u, 0x97B7u, 0x6444u, 0xFBE6u, 0x0815u,
+ 0xEBCCu, 0x183Fu, 0x879Du, 0x746Eu, 0x336Eu, 0xC09Du, 0x5F3Fu, 0xACCCu,
+ 0xD13Fu, 0x22CCu, 0xBD6Eu, 0x4E9Du, 0x099Du, 0xFA6Eu, 0x65CCu, 0x963Fu,
+ 0x9E2Au, 0x6DD9u, 0xF27Bu, 0x0188u, 0x4688u, 0xB57Bu, 0x2AD9u, 0xD92Au,
+ 0xA4D9u, 0x572Au, 0xC888u, 0x3B7Bu, 0x7C7Bu, 0x8F88u, 0x102Au, 0xE3D9u,
+ 0x5C2Fu, 0xAFDCu, 0x307Eu, 0xC38Du, 0x848Du, 0x777Eu, 0xE8DCu, 0x1B2Fu,
+ 0x66DCu, 0x952Fu, 0x0A8Du, 0xF97Eu, 0xBE7Eu, 0x4D8Du, 0xD22Fu, 0x21DCu,
+ 0x29C9u, 0xDA3Au, 0x4598u, 0xB66Bu, 0xF16Bu, 0x0298u, 0x9D3Au, 0x6EC9u,
+ 0x133Au, 0xE0C9u, 0x7F6Bu, 0x8C98u, 0xCB98u, 0x386Bu, 0xA7C9u, 0x543Au,
+ 0xB7E3u, 0x4410u, 0xDBB2u, 0x2841u, 0x6F41u, 0x9CB2u, 0x0310u, 0xF0E3u,
+ 0x8D10u, 0x7EE3u, 0xE141u, 0x12B2u, 0x55B2u, 0xA641u, 0x39E3u, 0xCA10u,
+ 0xC205u, 0x31F6u, 0xAE54u, 0x5DA7u, 0x1AA7u, 0xE954u, 0x76F6u, 0x8505u,
+ 0xF8F6u, 0x0B05u, 0x94A7u, 0x6754u, 0x2054u, 0xD3A7u, 0x4C05u, 0xBFF6u,
+ 0xB85Eu, 0x4BADu, 0xD40Fu, 0x27FCu, 0x60FCu, 0x930Fu, 0x0CADu, 0xFF5Eu,
+ 0x82ADu, 0x715Eu, 0xEEFCu, 0x1D0Fu, 0x5A0Fu, 0xA9FCu, 0x365Eu, 0xC5ADu,
+ 0xCDB8u, 0x3E4Bu, 0xA1E9u, 0x521Au, 0x151Au, 0xE6E9u, 0x794Bu, 0x8AB8u,
+ 0xF74Bu, 0x04B8u, 0x9B1Au, 0x68E9u, 0x2FE9u, 0xDC1Au, 0x43B8u, 0xB04Bu,
+ 0x5392u, 0xA061u, 0x3FC3u, 0xCC30u, 0x8B30u, 0x78C3u, 0xE761u, 0x1492u,
+ 0x6961u, 0x9A92u, 0x0530u, 0xF6C3u, 0xB1C3u, 0x4230u, 0xDD92u, 0x2E61u,
+ 0x2674u, 0xD587u, 0x4A25u, 0xB9D6u, 0xFED6u, 0x0D25u, 0x9287u, 0x6174u,
+ 0x1C87u, 0xEF74u, 0x70D6u, 0x8325u, 0xC425u, 0x37D6u, 0xA874u, 0x5B87u,
+ 0xE471u, 0x1782u, 0x8820u, 0x7BD3u, 0x3CD3u, 0xCF20u, 0x5082u, 0xA371u,
+ 0xDE82u, 0x2D71u, 0xB2D3u, 0x4120u, 0x0620u, 0xF5D3u, 0x6A71u, 0x9982u,
+ 0x9197u, 0x6264u, 0xFDC6u, 0x0E35u, 0x4935u, 0xBAC6u, 0x2564u, 0xD697u,
+ 0xAB64u, 0x5897u, 0xC735u, 0x34C6u, 0x73C6u, 0x8035u, 0x1F97u, 0xEC64u,
+ 0x0FBDu, 0xFC4Eu, 0x63ECu, 0x901Fu, 0xD71Fu, 0x24ECu, 0xBB4Eu, 0x48BDu,
+ 0x354Eu, 0xC6BDu, 0x591Fu, 0xAAECu, 0xEDECu, 0x1E1Fu, 0x81BDu, 0x724Eu,
+ 0x7A5Bu, 0x89A8u, 0x160Au, 0xE5F9u, 0xA2F9u, 0x510Au, 0xCEA8u, 0x3D5Bu,
+ 0x40A8u, 0xB35Bu, 0x2CF9u, 0xDF0Au, 0x980Au, 0x6BF9u, 0xF45Bu, 0x07A8u
+ },
+ {
+ 0x0000u, 0xFB0Bu, 0x7DA1u, 0x86AAu, 0xFB42u, 0x0049u, 0x86E3u, 0x7DE8u,
+ 0x7D33u, 0x8638u, 0x0092u, 0xFB99u, 0x8671u, 0x7D7Au, 0xFBD0u, 0x00DBu,
+ 0xFA66u, 0x016Du, 0x87C7u, 0x7CCCu, 0x0124u, 0xFA2Fu, 0x7C85u, 0x878Eu,
+ 0x8755u, 0x7C5Eu, 0xFAF4u, 0x01FFu, 0x7C17u, 0x871Cu, 0x01B6u, 0xFABDu,
+ 0x7F7Bu, 0x8470u, 0x02DAu, 0xF9D1u, 0x8439u, 0x7F32u, 0xF998u, 0x0293u,
+ 0x0248u, 0xF943u, 0x7FE9u, 0x84E2u, 0xF90Au, 0x0201u, 0x84ABu, 0x7FA0u,
+ 0x851Du, 0x7E16u, 0xF8BCu, 0x03B7u, 0x7E5Fu, 0x8554u, 0x03FEu, 0xF8F5u,
+ 0xF82Eu, 0x0325u, 0x858Fu, 0x7E84u, 0x036Cu, 0xF867u, 0x7ECDu, 0x85C6u,
+ 0xFEF6u, 0x05FDu, 0x8357u, 0x785Cu, 0x05B4u, 0xFEBFu, 0x7815u, 0x831Eu,
+ 0x83C5u, 0x78CEu, 0xFE64u, 0x056Fu, 0x7887u, 0x838Cu, 0x0526u, 0xFE2Du,
+ 0x0490u, 0xFF9Bu, 0x7931u, 0x823Au, 0xFFD2u, 0x04D9u, 0x8273u, 0x7978u,
+ 0x79A3u, 0x82A8u, 0x0402u, 0xFF09u, 0x82E1u, 0x79EAu, 0xFF40u, 0x044Bu,
+ 0x818Du, 0x7A86u, 0xFC2Cu, 0x0727u, 0x7ACFu, 0x81C4u, 0x076Eu, 0xFC65u,
+ 0xFCBEu, 0x07B5u, 0x811Fu, 0x7A14u, 0x07FCu, 0xFCF7u, 0x7A5Du, 0x8156u,
+ 0x7BEBu, 0x80E0u, 0x064Au, 0xFD41u, 0x80A9u, 0x7BA2u, 0xFD08u, 0x0603u,
+ 0x06D8u, 0xFDD3u, 0x7B79u, 0x8072u, 0xFD9Au, 0x0691u, 0x803Bu, 0x7B30u,
+ 0x765Bu, 0x8D50u, 0x0BFAu, 0xF0F1u, 0x8D19u, 0x7612u, 0xF0B8u, 0x0BB3u,
+ 0x0B68u, 0xF063u, 0x76C9u, 0x8DC2u, 0xF02Au, 0x0B21u, 0x8D8Bu, 0x7680u,
+ 0x8C3Du, 0x7736u, 0xF19Cu, 0x0A97u, 0x777Fu, 0x8C74u, 0x0ADEu, 0xF1D5u,
+ 0xF10Eu, 0x0A05u, 0x8CAFu, 0x77A4u, 0x0A4Cu, 0xF147u, 0x77EDu, 0x8CE6u,
+ 0x0920u, 0xF22Bu, 0x7481u, 0x8F8Au, 0xF262u, 0x0969u, 0x8FC3u, 0x74C8u,
+ 0x7413u, 0x8F18u, 0x09B2u, 0xF2B9u, 0x8F51u, 0x745Au, 0xF2F0u, 0x09FBu,
+ 0xF346u, 0x084Du, 0x8EE7u, 0x75ECu, 0x0804u, 0xF30Fu, 0x75A5u, 0x8EAEu,
+ 0x8E75u, 0x757Eu, 0xF3D4u, 0x08DFu, 0x7537u, 0x8E3Cu, 0x0896u, 0xF39Du,
+ 0x88ADu, 0x73A6u, 0xF50Cu, 0x0E07u, 0x73EFu, 0x88E4u, 0x0E4Eu, 0xF545u,
+ 0xF59Eu, 0x0E95u, 0x883Fu, 0x7334u, 0x0EDCu, 0xF5D7u, 0x737Du, 0x8876u,
+ 0x72CBu, 0x89C0u, 0x0F6Au, 0xF461u, 0x8989u, 0x7282u, 0xF428u, 0x0F23u,
+ 0x0FF8u, 0xF4F3u, 0x7259u, 0x8952u, 0xF4BAu, 0x0FB1u, 0x891Bu, 0x7210u,
+ 0xF7D6u, 0x0CDDu, 0x8A77u, 0x717Cu, 0x0C94u, 0xF79Fu, 0x7135u, 0x8A3Eu,
+ 0x8AE5u, 0x71EEu, 0xF744u, 0x0C4Fu, 0x71A7u, 0x8AACu, 0x0C06u, 0xF70Du,
+ 0x0DB0u, 0xF6BBu, 0x7011u, 0x8B1Au, 0xF6F2u, 0x0DF9u, 0x8B53u, 0x7058u,
+ 0x7083u, 0x8B88u, 0x0D22u, 0xF629u, 0x8BC1u, 0x70CAu, 0xF660u, 0x0D6Bu
+ },
+ {
+ 0x0000u, 0xECB6u, 0x52DBu, 0xBE6Du, 0xA5B6u, 0x4900u, 0xF76Du, 0x1BDBu,
+ 0xC0DBu, 0x2C6Du, 0x9200u, 0x7EB6u, 0x656Du, 0x89DBu, 0x37B6u, 0xDB00u,
+ 0x0A01u, 0xE6B7u, 0x58DAu, 0xB46Cu, 0xAFB7u, 0x4301u, 0xFD6Cu, 0x11DAu,
+ 0xCADAu, 0x266Cu, 0x9801u, 0x74B7u, 0x6F6Cu, 0x83DAu, 0x3DB7u, 0xD101u,
+ 0x1402u, 0xF8B4u, 0x46D9u, 0xAA6Fu, 0xB1B4u, 0x5D02u, 0xE36Fu, 0x0FD9u,
+ 0xD4D9u, 0x386Fu, 0x8602u, 0x6AB4u, 0x716Fu, 0x9DD9u, 0x23B4u, 0xCF02u,
+ 0x1E03u, 0xF2B5u, 0x4CD8u, 0xA06Eu, 0xBBB5u, 0x5703u, 0xE96Eu, 0x05D8u,
+ 0xDED8u, 0x326Eu, 0x8C03u, 0x60B5u, 0x7B6Eu, 0x97D8u, 0x29B5u, 0xC503u,
+ 0x2804u, 0xC4B2u, 0x7ADFu, 0x9669u, 0x8DB2u, 0x6104u, 0xDF69u, 0x33DFu,
+ 0xE8DFu, 0x0469u, 0xBA04u, 0x56B2u, 0x4D69u, 0xA1DFu, 0x1FB2u, 0xF304u,
+ 0x2205u, 0xCEB3u, 0x70DEu, 0x9C68u, 0x87B3u, 0x6B05u, 0xD568u, 0x39DEu,
+ 0xE2DEu, 0x0E68u, 0xB005u, 0x5CB3u, 0x4768u, 0xABDEu, 0x15B3u, 0xF905u,
+ 0x3C06u, 0xD0B0u, 0x6EDDu, 0x826Bu, 0x99B0u, 0x7506u, 0xCB6Bu, 0x27DDu,
+ 0xFCDDu, 0x106Bu, 0xAE06u, 0x42B0u, 0x596Bu, 0xB5DDu, 0x0BB0u, 0xE706u,
+ 0x3607u, 0xDAB1u, 0x64DCu, 0x886Au, 0x93B1u, 0x7F07u, 0xC16Au, 0x2DDCu,
+ 0xF6DCu, 0x1A6Au, 0xA407u, 0x48B1u, 0x536Au, 0xBFDCu, 0x01B1u, 0xED07u,
+ 0x5008u, 0xBCBEu, 0x02D3u, 0xEE65u, 0xF5BEu, 0x1908u, 0xA765u, 0x4BD3u,
+ 0x90D3u, 0x7C65u, 0xC208u, 0x2EBEu, 0x3565u, 0xD9D3u, 0x67BEu, 0x8B08u,
+ 0x5A09u, 0xB6BFu, 0x08D2u, 0xE464u, 0xFFBFu, 0x1309u, 0xAD64u, 0x41D2u,
+ 0x9AD2u, 0x7664u, 0xC809u, 0x24BFu, 0x3F64u, 0xD3D2u, 0x6DBFu, 0x8109u,
+ 0x440Au, 0xA8BCu, 0x16D1u, 0xFA67u, 0xE1BCu, 0x0D0Au, 0xB367u, 0x5FD1u,
+ 0x84D1u, 0x6867u, 0xD60Au, 0x3ABCu, 0x2167u, 0xCDD1u, 0x73BCu, 0x9F0Au,
+ 0x4E0Bu, 0xA2BDu, 0x1CD0u, 0xF066u, 0xEBBDu, 0x070Bu, 0xB966u, 0x55D0u,
+ 0x8ED0u, 0x6266u, 0xDC0Bu, 0x30BDu, 0x2B66u, 0xC7D0u, 0x79BDu, 0x950Bu,
+ 0x780Cu, 0x94BAu, 0x2AD7u, 0xC661u, 0xDDBAu, 0x310Cu, 0x8F61u, 0x63D7u,
+ 0xB8D7u, 0x5461u, 0xEA0Cu, 0x06BAu, 0x1D61u, 0xF1D7u, 0x4FBAu, 0xA30Cu,
+ 0x720Du, 0x9EBBu, 0x20D6u, 0xCC60u, 0xD7BBu, 0x3B0Du, 0x8560u, 0x69D6u,
+ 0xB2D6u, 0x5E60u, 0xE00Du, 0x0CBBu, 0x1760u, 0xFBD6u, 0x45BBu, 0xA90Du,
+ 0x6C0Eu, 0x80B8u, 0x3ED5u, 0xD263u, 0xC9B8u, 0x250Eu, 0x9B63u, 0x77D5u,
+ 0xACD5u, 0x4063u, 0xFE0Eu, 0x12B8u, 0x0963u, 0xE5D5u, 0x5BB8u, 0xB70Eu,
+ 0x660Fu, 0x8AB9u, 0x34D4u, 0xD862u, 0xC3B9u, 0x2F0Fu, 0x9162u, 0x7DD4u,
+ 0xA6D4u, 0x4A62u, 0xF40Fu, 0x18B9u, 0x0362u, 0xEFD4u, 0x51B9u, 0xBD0Fu
+ },
+ {
+ 0x0000u, 0xA010u, 0xCB97u, 0x6B87u, 0x1C99u, 0xBC89u, 0xD70Eu, 0x771Eu,
+ 0x3932u, 0x9922u, 0xF2A5u, 0x52B5u, 0x25ABu, 0x85BBu, 0xEE3Cu, 0x4E2Cu,
+ 0x7264u, 0xD274u, 0xB9F3u, 0x19E3u, 0x6EFDu, 0xCEEDu, 0xA56Au, 0x057Au,
+ 0x4B56u, 0xEB46u, 0x80C1u, 0x20D1u, 0x57CFu, 0xF7DFu, 0x9C58u, 0x3C48u,
+ 0xE4C8u, 0x44D8u, 0x2F5Fu, 0x8F4Fu, 0xF851u, 0x5841u, 0x33C6u, 0x93D6u,
+ 0xDDFAu, 0x7DEAu, 0x166Du, 0xB67Du, 0xC163u, 0x6173u, 0x0AF4u, 0xAAE4u,
+ 0x96ACu, 0x36BCu, 0x5D3Bu, 0xFD2Bu, 0x8A35u, 0x2A25u, 0x41A2u, 0xE1B2u,
+ 0xAF9Eu, 0x0F8Eu, 0x6409u, 0xC419u, 0xB307u, 0x1317u, 0x7890u, 0xD880u,
+ 0x4227u, 0xE237u, 0x89B0u, 0x29A0u, 0x5EBEu, 0xFEAEu, 0x9529u, 0x3539u,
+ 0x7B15u, 0xDB05u, 0xB082u, 0x1092u, 0x678Cu, 0xC79Cu, 0xAC1Bu, 0x0C0Bu,
+ 0x3043u, 0x9053u, 0xFBD4u, 0x5BC4u, 0x2CDAu, 0x8CCAu, 0xE74Du, 0x475Du,
+ 0x0971u, 0xA961u, 0xC2E6u, 0x62F6u, 0x15E8u, 0xB5F8u, 0xDE7Fu, 0x7E6Fu,
+ 0xA6EFu, 0x06FFu, 0x6D78u, 0xCD68u, 0xBA76u, 0x1A66u, 0x71E1u, 0xD1F1u,
+ 0x9FDDu, 0x3FCDu, 0x544Au, 0xF45Au, 0x8344u, 0x2354u, 0x48D3u, 0xE8C3u,
+ 0xD48Bu, 0x749Bu, 0x1F1Cu, 0xBF0Cu, 0xC812u, 0x6802u, 0x0385u, 0xA395u,
+ 0xEDB9u, 0x4DA9u, 0x262Eu, 0x863Eu, 0xF120u, 0x5130u, 0x3AB7u, 0x9AA7u,
+ 0x844Eu, 0x245Eu, 0x4FD9u, 0xEFC9u, 0x98D7u, 0x38C7u, 0x5340u, 0xF350u,
+ 0xBD7Cu, 0x1D6Cu, 0x76EBu, 0xD6FBu, 0xA1E5u, 0x01F5u, 0x6A72u, 0xCA62u,
+ 0xF62Au, 0x563Au, 0x3DBDu, 0x9DADu, 0xEAB3u, 0x4AA3u, 0x2124u, 0x8134u,
+ 0xCF18u, 0x6F08u, 0x048Fu, 0xA49Fu, 0xD381u, 0x7391u, 0x1816u, 0xB806u,
+ 0x6086u, 0xC096u, 0xAB11u, 0x0B01u, 0x7C1Fu, 0xDC0Fu, 0xB788u, 0x1798u,
+ 0x59B4u, 0xF9A4u, 0x9223u, 0x3233u, 0x452Du, 0xE53Du, 0x8EBAu, 0x2EAAu,
+ 0x12E2u, 0xB2F2u, 0xD975u, 0x7965u, 0x0E7Bu, 0xAE6Bu, 0xC5ECu, 0x65FCu,
+ 0x2BD0u, 0x8BC0u, 0xE047u, 0x4057u, 0x3749u, 0x9759u, 0xFCDEu, 0x5CCEu,
+ 0xC669u, 0x6679u, 0x0DFEu, 0xADEEu, 0xDAF0u, 0x7AE0u, 0x1167u, 0xB177u,
+ 0xFF5Bu, 0x5F4Bu, 0x34CCu, 0x94DCu, 0xE3C2u, 0x43D2u, 0x2855u, 0x8845u,
+ 0xB40Du, 0x141Du, 0x7F9Au, 0xDF8Au, 0xA894u, 0x0884u, 0x6303u, 0xC313u,
+ 0x8D3Fu, 0x2D2Fu, 0x46A8u, 0xE6B8u, 0x91A6u, 0x31B6u, 0x5A31u, 0xFA21u,
+ 0x22A1u, 0x82B1u, 0xE936u, 0x4926u, 0x3E38u, 0x9E28u, 0xF5AFu, 0x55BFu,
+ 0x1B93u, 0xBB83u, 0xD004u, 0x7014u, 0x070Au, 0xA71Au, 0xCC9Du, 0x6C8Du,
+ 0x50C5u, 0xF0D5u, 0x9B52u, 0x3B42u, 0x4C5Cu, 0xEC4Cu, 0x87CBu, 0x27DBu,
+ 0x69F7u, 0xC9E7u, 0xA260u, 0x0270u, 0x756Eu, 0xD57Eu, 0xBEF9u, 0x1EE9u
+ },
+ {
+ 0x0000u, 0x832Bu, 0x8DE1u, 0x0ECAu, 0x9075u, 0x135Eu, 0x1D94u, 0x9EBFu,
+ 0xAB5Du, 0x2876u, 0x26BCu, 0xA597u, 0x3B28u, 0xB803u, 0xB6C9u, 0x35E2u,
+ 0xDD0Du, 0x5E26u, 0x50ECu, 0xD3C7u, 0x4D78u, 0xCE53u, 0xC099u, 0x43B2u,
+ 0x7650u, 0xF57Bu, 0xFBB1u, 0x789Au, 0xE625u, 0x650Eu, 0x6BC4u, 0xE8EFu,
+ 0x31ADu, 0xB286u, 0xBC4Cu, 0x3F67u, 0xA1D8u, 0x22F3u, 0x2C39u, 0xAF12u,
+ 0x9AF0u, 0x19DBu, 0x1711u, 0x943Au, 0x0A85u, 0x89AEu, 0x8764u, 0x044Fu,
+ 0xECA0u, 0x6F8Bu, 0x6141u, 0xE26Au, 0x7CD5u, 0xFFFEu, 0xF134u, 0x721Fu,
+ 0x47FDu, 0xC4D6u, 0xCA1Cu, 0x4937u, 0xD788u, 0x54A3u, 0x5A69u, 0xD942u,
+ 0x635Au, 0xE071u, 0xEEBBu, 0x6D90u, 0xF32Fu, 0x7004u, 0x7ECEu, 0xFDE5u,
+ 0xC807u, 0x4B2Cu, 0x45E6u, 0xC6CDu, 0x5872u, 0xDB59u, 0xD593u, 0x56B8u,
+ 0xBE57u, 0x3D7Cu, 0x33B6u, 0xB09Du, 0x2E22u, 0xAD09u, 0xA3C3u, 0x20E8u,
+ 0x150Au, 0x9621u, 0x98EBu, 0x1BC0u, 0x857Fu, 0x0654u, 0x089Eu, 0x8BB5u,
+ 0x52F7u, 0xD1DCu, 0xDF16u, 0x5C3Du, 0xC282u, 0x41A9u, 0x4F63u, 0xCC48u,
+ 0xF9AAu, 0x7A81u, 0x744Bu, 0xF760u, 0x69DFu, 0xEAF4u, 0xE43Eu, 0x6715u,
+ 0x8FFAu, 0x0CD1u, 0x021Bu, 0x8130u, 0x1F8Fu, 0x9CA4u, 0x926Eu, 0x1145u,
+ 0x24A7u, 0xA78Cu, 0xA946u, 0x2A6Du, 0xB4D2u, 0x37F9u, 0x3933u, 0xBA18u,
+ 0xC6B4u, 0x459Fu, 0x4B55u, 0xC87Eu, 0x56C1u, 0xD5EAu, 0xDB20u, 0x580Bu,
+ 0x6DE9u, 0xEEC2u, 0xE008u, 0x6323u, 0xFD9Cu, 0x7EB7u, 0x707Du, 0xF356u,
+ 0x1BB9u, 0x9892u, 0x9658u, 0x1573u, 0x8BCCu, 0x08E7u, 0x062Du, 0x8506u,
+ 0xB0E4u, 0x33CFu, 0x3D05u, 0xBE2Eu, 0x2091u, 0xA3BAu, 0xAD70u, 0x2E5Bu,
+ 0xF719u, 0x7432u, 0x7AF8u, 0xF9D3u, 0x676Cu, 0xE447u, 0xEA8Du, 0x69A6u,
+ 0x5C44u, 0xDF6Fu, 0xD1A5u, 0x528Eu, 0xCC31u, 0x4F1Au, 0x41D0u, 0xC2FBu,
+ 0x2A14u, 0xA93Fu, 0xA7F5u, 0x24DEu, 0xBA61u, 0x394Au, 0x3780u, 0xB4ABu,
+ 0x8149u, 0x0262u, 0x0CA8u, 0x8F83u, 0x113Cu, 0x9217u, 0x9CDDu, 0x1FF6u,
+ 0xA5EEu, 0x26C5u, 0x280Fu, 0xAB24u, 0x359Bu, 0xB6B0u, 0xB87Au, 0x3B51u,
+ 0x0EB3u, 0x8D98u, 0x8352u, 0x0079u, 0x9EC6u, 0x1DEDu, 0x1327u, 0x900Cu,
+ 0x78E3u, 0xFBC8u, 0xF502u, 0x7629u, 0xE896u, 0x6BBDu, 0x6577u, 0xE65Cu,
+ 0xD3BEu, 0x5095u, 0x5E5Fu, 0xDD74u, 0x43CBu, 0xC0E0u, 0xCE2Au, 0x4D01u,
+ 0x9443u, 0x1768u, 0x19A2u, 0x9A89u, 0x0436u, 0x871Du, 0x89D7u, 0x0AFCu,
+ 0x3F1Eu, 0xBC35u, 0xB2FFu, 0x31D4u, 0xAF6Bu, 0x2C40u, 0x228Au, 0xA1A1u,
+ 0x494Eu, 0xCA65u, 0xC4AFu, 0x4784u, 0xD93Bu, 0x5A10u, 0x54DAu, 0xD7F1u,
+ 0xE213u, 0x6138u, 0x6FF2u, 0xECD9u, 0x7266u, 0xF14Du, 0xFF87u, 0x7CACu
+ }
};

__u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len)
{
- unsigned int i;
+ const __u8 *i = (const __u8 *)buffer;
+ const __u8 *i_end = i + len;
+ const __u8 *i_last16 = i + (len / 16 * 16);

- for (i = 0 ; i < len ; i++)
- crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
+ for (; i < i_last16; i += 16) {
+ crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^
+ t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^
+ t10_dif_crc_table[13][i[2]] ^
+ t10_dif_crc_table[12][i[3]] ^
+ t10_dif_crc_table[11][i[4]] ^
+ t10_dif_crc_table[10][i[5]] ^
+ t10_dif_crc_table[9][i[6]] ^
+ t10_dif_crc_table[8][i[7]] ^
+ t10_dif_crc_table[7][i[8]] ^
+ t10_dif_crc_table[6][i[9]] ^
+ t10_dif_crc_table[5][i[10]] ^
+ t10_dif_crc_table[4][i[11]] ^
+ t10_dif_crc_table[3][i[12]] ^
+ t10_dif_crc_table[2][i[13]] ^
+ t10_dif_crc_table[1][i[14]] ^
+ t10_dif_crc_table[0][i[15]];
+ }
+
+ for (; i < i_end; i++)
+ crc = t10_dif_crc_table[0][*i ^ (__u8)(crc >> 8)] ^ (crc << 8);

return crc;
}
--
1.8.3.1


2018-08-10 19:23:06

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote:
> This patch provides a performance improvement for the CRC16 calculations done in read/write
> workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
> workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
> bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
> folks from utilizing the throughput of such devices. To speed up this calculation and expose
> the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
> with a larger CRC table to match. The result has shown 5x performance improvements on various
> big endian and little endian systems running the 4.18.0 kernel version.

Thanks.

This seems a sensible tradeoff for the 4k text size increase.

> diff --git a/crypto/crct10dif_common.c b/crypto/crct10dif_common.c

[]

trivia:

> +static const __u16 t10_dif_crc_table[16][256] = {
> + {
> + 0x0000u, 0x8BB7u, 0x9CD9u, 0x176Eu, 0xB205u, 0x39B2u, 0x2EDCu, 0xA56Bu,

All the 'u's are unnecessary visual noise.

2018-08-10 20:02:23

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Fri, 10 Aug 2018, Joe Perches wrote:

> On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote:
> > This patch provides a performance improvement for the CRC16 calculations done in read/write
> > workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
> > workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
> > bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
> > folks from utilizing the throughput of such devices. To speed up this calculation and expose
> > the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
> > with a larger CRC table to match. The result has shown 5x performance improvements on various
> > big endian and little endian systems running the 4.18.0 kernel version.
>
> Thanks.
>
> This seems a sensible tradeoff for the 4k text size increase.

More like 7.5KB. Would be best if this was configurable so the small
version remained available.


Nicolas

2018-08-10 20:00:12

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Fri, 10 Aug 2018, Jeff Lien wrote:

> This patch provides a performance improvement for the CRC16 calculations done in read/write
> workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
> workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
> bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
> folks from utilizing the throughput of such devices. To speed up this calculation and expose
> the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
> with a larger CRC table to match. The result has shown 5x performance improvements on various
> big endian and little endian systems running the 4.18.0 kernel version.

You are nevertheless increasing the kernel size by 7.5 KB.

Could the small table still be preserved with a config option for those
who require small more than fast?

That could look like:

static const __u16 t10_dif_crc_table[][256] = {
{
[...]
},
#ifndef CONFIG_CRC16_SMALL
{
[...]
[...]
},
#endif
};

and the code to suit.


Nicolas

2018-08-10 20:16:02

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Fri, Aug 10, 2018 at 02:12:11PM -0500, Jeff Lien wrote:
> This patch provides a performance improvement for the CRC16 calculations done in read/write
> workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
> workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
> bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
> folks from utilizing the throughput of such devices. To speed up this calculation and expose
> the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
> with a larger CRC table to match. The result has shown 5x performance improvements on various
> big endian and little endian systems running the 4.18.0 kernel version.
>
> FIO Sequential Write, 64K Block Size, Queue Depth 64
> BE Base Kernel: bw=201.5 MiB/s
> BE Modified CRC Calc: bw=968.1 MiB/s
> 4.80x performance improvement
>
> LE Base Kernel: bw=357 MiB/s
> LE Modified CRC Calc: bw=1964 MiB/s
> 5.51x performance improvement
>
> FIO Sequential Read, 64K Block Size, Queue Depth 64
> BE Base Kernel: bw=611.2 MiB/s
> BE Modified CRC calc: bw=684.9 MiB/s
> 1.12x performance improvement
>
> LE Base Kernel: bw=797 MiB/s
> LE Modified CRC Calc: bw=2730 MiB/s
> 3.42x performance improvement

Did you also test the slice-by-4 (requires 2048-byte table) and slice-by-8
(requires 4096-byte table) methods? Your proposal is slice-by-16 (requires
8192-byte table); the original was slice-by-1 (requires 512-byte table).

> __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len)
> {
> - unsigned int i;
> + const __u8 *i = (const __u8 *)buffer;
> + const __u8 *i_end = i + len;
> + const __u8 *i_last16 = i + (len / 16 * 16);

'i' is normally a loop counter, not a pointer.
Use 'p', 'p_end', and 'p_last16'.

>
> - for (i = 0 ; i < len ; i++)
> - crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
> + for (; i < i_last16; i += 16) {
> + crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^
> + t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^
> + t10_dif_crc_table[13][i[2]] ^
> + t10_dif_crc_table[12][i[3]] ^
> + t10_dif_crc_table[11][i[4]] ^
> + t10_dif_crc_table[10][i[5]] ^
> + t10_dif_crc_table[9][i[6]] ^
> + t10_dif_crc_table[8][i[7]] ^
> + t10_dif_crc_table[7][i[8]] ^
> + t10_dif_crc_table[6][i[9]] ^
> + t10_dif_crc_table[5][i[10]] ^
> + t10_dif_crc_table[4][i[11]] ^
> + t10_dif_crc_table[3][i[12]] ^
> + t10_dif_crc_table[2][i[13]] ^
> + t10_dif_crc_table[1][i[14]] ^
> + t10_dif_crc_table[0][i[15]];
> + }

Please indent this properly.

crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^
t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^
t10_dif_crc_table[13][i[2]] ^
t10_dif_crc_table[12][i[3]] ^
t10_dif_crc_table[11][i[4]] ^
...

- Eric

2018-08-10 20:56:15

by Douglas Gilbert

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On 2018-08-10 03:12 PM, Jeff Lien wrote:
> This patch provides a performance improvement for the CRC16 calculations done in read/write
> workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
> workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
> bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
> folks from utilizing the throughput of such devices. To speed up this calculation and expose
> the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
> with a larger CRC table to match. The result has shown 5x performance improvements on various
> big endian and little endian systems running the 4.18.0 kernel version.
>
> FIO Sequential Write, 64K Block Size, Queue Depth 64
> BE Base Kernel: bw=201.5 MiB/s
> BE Modified CRC Calc: bw=968.1 MiB/s
> 4.80x performance improvement
>
> LE Base Kernel: bw=357 MiB/s
> LE Modified CRC Calc: bw=1964 MiB/s
> 5.51x performance improvement
>
> FIO Sequential Read, 64K Block Size, Queue Depth 64
> BE Base Kernel: bw=611.2 MiB/s
> BE Modified CRC calc: bw=684.9 MiB/s
> 1.12x performance improvement
>
> LE Base Kernel: bw=797 MiB/s
> LE Modified CRC Calc: bw=2730 MiB/s
> 3.42x performance improvement
>
> Reviewed-by: Dave Darrington <[email protected]>
> Reviewed-by: Jeff Furlong <[email protected]>
> Signed-off-by: Jeff Lien <[email protected]>
> ---
> crypto/crct10dif_common.c | 605 +++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 569 insertions(+), 36 deletions(-)
>
> diff --git a/crypto/crct10dif_common.c b/crypto/crct10dif_common.c
> index b2fab36..40e1d6c 100644
> --- a/crypto/crct10dif_common.c
> +++ b/crypto/crct10dif_common.c
> @@ -32,47 +32,580 @@
> * x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
> * gt: 0x8bb7
> */
> -static const __u16 t10_dif_crc_table[256] = {

<snip table>
>
> __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len)
> {
> - unsigned int i;
> + const __u8 *i = (const __u8 *)buffer;
> + const __u8 *i_end = i + len;
> + const __u8 *i_last16 = i + (len / 16 * 16) >
> - for (i = 0 ; i < len ; i++)
> - crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
> + for (; i < i_last16; i += 16) {
> + crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^

The bswap_16() macro may be faster than crc >> 8 .

> + t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^

How is (crc >> 0) different from crc?

> + t10_dif_crc_table[13][i[2]] ^
> + t10_dif_crc_table[12][i[3]] ^
> + t10_dif_crc_table[11][i[4]] ^
> + t10_dif_crc_table[10][i[5]] ^
> + t10_dif_crc_table[9][i[6]] ^
> + t10_dif_crc_table[8][i[7]] ^
> + t10_dif_crc_table[7][i[8]] ^
> + t10_dif_crc_table[6][i[9]] ^
> + t10_dif_crc_table[5][i[10]] ^
> + t10_dif_crc_table[4][i[11]] ^
> + t10_dif_crc_table[3][i[12]] ^
> + t10_dif_crc_table[2][i[13]] ^
> + t10_dif_crc_table[1][i[14]] ^
> + t10_dif_crc_table[0][i[15]];

Since n in i[n] is marching from 0 to 15 then all but the first (i.e. i[0])
could be replaced by *(++i) . The first for loop statement would then
become:
for (; i < i_last16; ++i) {

The two dimensional indexing could be flattened to further (ugly) pointer
manipulations, perhaps gaining some cycles, at the expense of clarity.
If so you could keep some of the two dimensional indexing lines commented
for documentation of the intent.

Doug Gilbert

> + }
> +
> + for (; i < i_end; i++)
> + crc = t10_dif_crc_table[0][*i ^ (__u8)(crc >> 8)] ^ (crc << 8);
>
> return crc;
> }
>

2018-08-11 00:11:24

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Fri, 2018-08-10 at 16:02 -0400, Nicolas Pitre wrote:
> On Fri, 10 Aug 2018, Joe Perches wrote:
>
> > On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote:
> > > This patch provides a performance improvement for the CRC16 calculations done in read/write
> > > workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
> > > workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
> > > bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
> > > folks from utilizing the throughput of such devices. To speed up this calculation and expose
> > > the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
> > > with a larger CRC table to match. The result has shown 5x performance improvements on various
> > > big endian and little endian systems running the 4.18.0 kernel version.
> >
> > Thanks.
> >
> > This seems a sensible tradeoff for the 4k text size increase.
>
> More like 7.5KB. Would be best if this was configurable so the small
> version remained available.

Maybe something like: (compiled, untested)
---
crypto/Kconfig | 10 +
crypto/crct10dif_common.c | 543 +++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 549 insertions(+), 4 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index f3e40ac56d93..88d9d17bb18a 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -618,6 +618,16 @@ config CRYPTO_CRCT10DIF
a crypto transform. This allows for faster crc t10 diff
transforms to be used if they are available.

+config CRYPTO_CRCT10DIF_TABLE_SIZE
+ int "Size of CRCT10DIF crc tables (as a power of 2)"
+ depends on CRYPTO_CRCT10DIF
+ range 1 5
+ default 1 if EMBEDDED
+ default 5
+ help
+ Set the table size used by the CRYPTO_CRCT10DIF crc calculation
+ Larger values use more memory and are faster.
+
config CRYPTO_CRCT10DIF_PCLMUL
tristate "CRCT10DIF PCLMULQDQ hardware acceleration"
depends on X86 && 64BIT && CRC_T10DIF
diff --git a/crypto/crct10dif_common.c b/crypto/crct10dif_common.c
index b2fab366f518..4eb1c50c3688 100644
--- a/crypto/crct10dif_common.c
+++ b/crypto/crct10dif_common.c
@@ -32,7 +32,8 @@
* x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
* gt: 0x8bb7
*/
-static const __u16 t10_dif_crc_table[256] = {
+static const __u16 t10dif_crc_table[][256] = {
+ {
0x0000, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
@@ -65,14 +66,548 @@ static const __u16 t10_dif_crc_table[256] = {
0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
+ },
+#if CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE >= 2
+ {
+ 0x0000, 0x7562, 0xEAC4, 0x9FA6, 0x5E3F, 0x2B5D, 0xB4FB, 0xC199,
+ 0xBC7E, 0xC91C, 0x56BA, 0x23D8, 0xE241, 0x9723, 0x0885, 0x7DE7,
+ 0xF34B, 0x8629, 0x198F, 0x6CED, 0xAD74, 0xD816, 0x47B0, 0x32D2,
+ 0x4F35, 0x3A57, 0xA5F1, 0xD093, 0x110A, 0x6468, 0xFBCE, 0x8EAC,
+ 0x6D21, 0x1843, 0x87E5, 0xF287, 0x331E, 0x467C, 0xD9DA, 0xACB8,
+ 0xD15F, 0xA43D, 0x3B9B, 0x4EF9, 0x8F60, 0xFA02, 0x65A4, 0x10C6,
+ 0x9E6A, 0xEB08, 0x74AE, 0x01CC, 0xC055, 0xB537, 0x2A91, 0x5FF3,
+ 0x2214, 0x5776, 0xC8D0, 0xBDB2, 0x7C2B, 0x0949, 0x96EF, 0xE38D,
+ 0xDA42, 0xAF20, 0x3086, 0x45E4, 0x847D, 0xF11F, 0x6EB9, 0x1BDB,
+ 0x663C, 0x135E, 0x8CF8, 0xF99A, 0x3803, 0x4D61, 0xD2C7, 0xA7A5,
+ 0x2909, 0x5C6B, 0xC3CD, 0xB6AF, 0x7736, 0x0254, 0x9DF2, 0xE890,
+ 0x9577, 0xE015, 0x7FB3, 0x0AD1, 0xCB48, 0xBE2A, 0x218C, 0x54EE,
+ 0xB763, 0xC201, 0x5DA7, 0x28C5, 0xE95C, 0x9C3E, 0x0398, 0x76FA,
+ 0x0B1D, 0x7E7F, 0xE1D9, 0x94BB, 0x5522, 0x2040, 0xBFE6, 0xCA84,
+ 0x4428, 0x314A, 0xAEEC, 0xDB8E, 0x1A17, 0x6F75, 0xF0D3, 0x85B1,
+ 0xF856, 0x8D34, 0x1292, 0x67F0, 0xA669, 0xD30B, 0x4CAD, 0x39CF,
+ 0x3F33, 0x4A51, 0xD5F7, 0xA095, 0x610C, 0x146E, 0x8BC8, 0xFEAA,
+ 0x834D, 0xF62F, 0x6989, 0x1CEB, 0xDD72, 0xA810, 0x37B6, 0x42D4,
+ 0xCC78, 0xB91A, 0x26BC, 0x53DE, 0x9247, 0xE725, 0x7883, 0x0DE1,
+ 0x7006, 0x0564, 0x9AC2, 0xEFA0, 0x2E39, 0x5B5B, 0xC4FD, 0xB19F,
+ 0x5212, 0x2770, 0xB8D6, 0xCDB4, 0x0C2D, 0x794F, 0xE6E9, 0x938B,
+ 0xEE6C, 0x9B0E, 0x04A8, 0x71CA, 0xB053, 0xC531, 0x5A97, 0x2FF5,
+ 0xA159, 0xD43B, 0x4B9D, 0x3EFF, 0xFF66, 0x8A04, 0x15A2, 0x60C0,
+ 0x1D27, 0x6845, 0xF7E3, 0x8281, 0x4318, 0x367A, 0xA9DC, 0xDCBE,
+ 0xE571, 0x9013, 0x0FB5, 0x7AD7, 0xBB4E, 0xCE2C, 0x518A, 0x24E8,
+ 0x590F, 0x2C6D, 0xB3CB, 0xC6A9, 0x0730, 0x7252, 0xEDF4, 0x9896,
+ 0x163A, 0x6358, 0xFCFE, 0x899C, 0x4805, 0x3D67, 0xA2C1, 0xD7A3,
+ 0xAA44, 0xDF26, 0x4080, 0x35E2, 0xF47B, 0x8119, 0x1EBF, 0x6BDD,
+ 0x8850, 0xFD32, 0x6294, 0x17F6, 0xD66F, 0xA30D, 0x3CAB, 0x49C9,
+ 0x342E, 0x414C, 0xDEEA, 0xAB88, 0x6A11, 0x1F73, 0x80D5, 0xF5B7,
+ 0x7B1B, 0x0E79, 0x91DF, 0xE4BD, 0x2524, 0x5046, 0xCFE0, 0xBA82,
+ 0xC765, 0xB207, 0x2DA1, 0x58C3, 0x995A, 0xEC38, 0x739E, 0x06FC
+ },
+#endif
+#if CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE >= 3
+ {
+ 0x0000, 0x7E66, 0xFCCC, 0x82AA, 0x722F, 0x0C49, 0x8EE3, 0xF085,
+ 0xE45E, 0x9A38, 0x1892, 0x66F4, 0x9671, 0xE817, 0x6ABD, 0x14DB,
+ 0x430B, 0x3D6D, 0xBFC7, 0xC1A1, 0x3124, 0x4F42, 0xCDE8, 0xB38E,
+ 0xA755, 0xD933, 0x5B99, 0x25FF, 0xD57A, 0xAB1C, 0x29B6, 0x57D0,
+ 0x8616, 0xF870, 0x7ADA, 0x04BC, 0xF439, 0x8A5F, 0x08F5, 0x7693,
+ 0x6248, 0x1C2E, 0x9E84, 0xE0E2, 0x1067, 0x6E01, 0xECAB, 0x92CD,
+ 0xC51D, 0xBB7B, 0x39D1, 0x47B7, 0xB732, 0xC954, 0x4BFE, 0x3598,
+ 0x2143, 0x5F25, 0xDD8F, 0xA3E9, 0x536C, 0x2D0A, 0xAFA0, 0xD1C6,
+ 0x879B, 0xF9FD, 0x7B57, 0x0531, 0xF5B4, 0x8BD2, 0x0978, 0x771E,
+ 0x63C5, 0x1DA3, 0x9F09, 0xE16F, 0x11EA, 0x6F8C, 0xED26, 0x9340,
+ 0xC490, 0xBAF6, 0x385C, 0x463A, 0xB6BF, 0xC8D9, 0x4A73, 0x3415,
+ 0x20CE, 0x5EA8, 0xDC02, 0xA264, 0x52E1, 0x2C87, 0xAE2D, 0xD04B,
+ 0x018D, 0x7FEB, 0xFD41, 0x8327, 0x73A2, 0x0DC4, 0x8F6E, 0xF108,
+ 0xE5D3, 0x9BB5, 0x191F, 0x6779, 0x97FC, 0xE99A, 0x6B30, 0x1556,
+ 0x4286, 0x3CE0, 0xBE4A, 0xC02C, 0x30A9, 0x4ECF, 0xCC65, 0xB203,
+ 0xA6D8, 0xD8BE, 0x5A14, 0x2472, 0xD4F7, 0xAA91, 0x283B, 0x565D,
+ 0x8481, 0xFAE7, 0x784D, 0x062B, 0xF6AE, 0x88C8, 0x0A62, 0x7404,
+ 0x60DF, 0x1EB9, 0x9C13, 0xE275, 0x12F0, 0x6C96, 0xEE3C, 0x905A,
+ 0xC78A, 0xB9EC, 0x3B46, 0x4520, 0xB5A5, 0xCBC3, 0x4969, 0x370F,
+ 0x23D4, 0x5DB2, 0xDF18, 0xA17E, 0x51FB, 0x2F9D, 0xAD37, 0xD351,
+ 0x0297, 0x7CF1, 0xFE5B, 0x803D, 0x70B8, 0x0EDE, 0x8C74, 0xF212,
+ 0xE6C9, 0x98AF, 0x1A05, 0x6463, 0x94E6, 0xEA80, 0x682A, 0x164C,
+ 0x419C, 0x3FFA, 0xBD50, 0xC336, 0x33B3, 0x4DD5, 0xCF7F, 0xB119,
+ 0xA5C2, 0xDBA4, 0x590E, 0x2768, 0xD7ED, 0xA98B, 0x2B21, 0x5547,
+ 0x031A, 0x7D7C, 0xFFD6, 0x81B0, 0x7135, 0x0F53, 0x8DF9, 0xF39F,
+ 0xE744, 0x9922, 0x1B88, 0x65EE, 0x956B, 0xEB0D, 0x69A7, 0x17C1,
+ 0x4011, 0x3E77, 0xBCDD, 0xC2BB, 0x323E, 0x4C58, 0xCEF2, 0xB094,
+ 0xA44F, 0xDA29, 0x5883, 0x26E5, 0xD660, 0xA806, 0x2AAC, 0x54CA,
+ 0x850C, 0xFB6A, 0x79C0, 0x07A6, 0xF723, 0x8945, 0x0BEF, 0x7589,
+ 0x6152, 0x1F34, 0x9D9E, 0xE3F8, 0x137D, 0x6D1B, 0xEFB1, 0x91D7,
+ 0xC607, 0xB861, 0x3ACB, 0x44AD, 0xB428, 0xCA4E, 0x48E4, 0x3682,
+ 0x2259, 0x5C3F, 0xDE95, 0xA0F3, 0x5076, 0x2E10, 0xACBA, 0xD2DC
+ },
+ {
+ 0x0000, 0x82B5, 0x8EDD, 0x0C68, 0x960D, 0x14B8, 0x18D0, 0x9A65,
+ 0xA7AD, 0x2518, 0x2970, 0xABC5, 0x31A0, 0xB315, 0xBF7D, 0x3DC8,
+ 0xC4ED, 0x4658, 0x4A30, 0xC885, 0x52E0, 0xD055, 0xDC3D, 0x5E88,
+ 0x6340, 0xE1F5, 0xED9D, 0x6F28, 0xF54D, 0x77F8, 0x7B90, 0xF925,
+ 0x026D, 0x80D8, 0x8CB0, 0x0E05, 0x9460, 0x16D5, 0x1ABD, 0x9808,
+ 0xA5C0, 0x2775, 0x2B1D, 0xA9A8, 0x33CD, 0xB178, 0xBD10, 0x3FA5,
+ 0xC680, 0x4435, 0x485D, 0xCAE8, 0x508D, 0xD238, 0xDE50, 0x5CE5,
+ 0x612D, 0xE398, 0xEFF0, 0x6D45, 0xF720, 0x7595, 0x79FD, 0xFB48,
+ 0x04DA, 0x866F, 0x8A07, 0x08B2, 0x92D7, 0x1062, 0x1C0A, 0x9EBF,
+ 0xA377, 0x21C2, 0x2DAA, 0xAF1F, 0x357A, 0xB7CF, 0xBBA7, 0x3912,
+ 0xC037, 0x4282, 0x4EEA, 0xCC5F, 0x563A, 0xD48F, 0xD8E7, 0x5A52,
+ 0x679A, 0xE52F, 0xE947, 0x6BF2, 0xF197, 0x7322, 0x7F4A, 0xFDFF,
+ 0x06B7, 0x8402, 0x886A, 0x0ADF, 0x90BA, 0x120F, 0x1E67, 0x9CD2,
+ 0xA11A, 0x23AF, 0x2FC7, 0xAD72, 0x3717, 0xB5A2, 0xB9CA, 0x3B7F,
+ 0xC25A, 0x40EF, 0x4C87, 0xCE32, 0x5457, 0xD6E2, 0xDA8A, 0x583F,
+ 0x65F7, 0xE742, 0xEB2A, 0x699F, 0xF3FA, 0x714F, 0x7D27, 0xFF92,
+ 0x09B4, 0x8B01, 0x8769, 0x05DC, 0x9FB9, 0x1D0C, 0x1164, 0x93D1,
+ 0xAE19, 0x2CAC, 0x20C4, 0xA271, 0x3814, 0xBAA1, 0xB6C9, 0x347C,
+ 0xCD59, 0x4FEC, 0x4384, 0xC131, 0x5B54, 0xD9E1, 0xD589, 0x573C,
+ 0x6AF4, 0xE841, 0xE429, 0x669C, 0xFCF9, 0x7E4C, 0x7224, 0xF091,
+ 0x0BD9, 0x896C, 0x8504, 0x07B1, 0x9DD4, 0x1F61, 0x1309, 0x91BC,
+ 0xAC74, 0x2EC1, 0x22A9, 0xA01C, 0x3A79, 0xB8CC, 0xB4A4, 0x3611,
+ 0xCF34, 0x4D81, 0x41E9, 0xC35C, 0x5939, 0xDB8C, 0xD7E4, 0x5551,
+ 0x6899, 0xEA2C, 0xE644, 0x64F1, 0xFE94, 0x7C21, 0x7049, 0xF2FC,
+ 0x0D6E, 0x8FDB, 0x83B3, 0x0106, 0x9B63, 0x19D6, 0x15BE, 0x970B,
+ 0xAAC3, 0x2876, 0x241E, 0xA6AB, 0x3CCE, 0xBE7B, 0xB213, 0x30A6,
+ 0xC983, 0x4B36, 0x475E, 0xC5EB, 0x5F8E, 0xDD3B, 0xD153, 0x53E6,
+ 0x6E2E, 0xEC9B, 0xE0F3, 0x6246, 0xF823, 0x7A96, 0x76FE, 0xF44B,
+ 0x0F03, 0x8DB6, 0x81DE, 0x036B, 0x990E, 0x1BBB, 0x17D3, 0x9566,
+ 0xA8AE, 0x2A1B, 0x2673, 0xA4C6, 0x3EA3, 0xBC16, 0xB07E, 0x32CB,
+ 0xCBEE, 0x495B, 0x4533, 0xC786, 0x5DE3, 0xDF56, 0xD33E, 0x518B,
+ 0x6C43, 0xEEF6, 0xE29E, 0x602B, 0xFA4E, 0x78FB, 0x7493, 0xF626
+ },
+#endif
+#if CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE >= 4
+ {
+ 0x0000, 0x1368, 0x26D0, 0x35B8, 0x4DA0, 0x5EC8, 0x6B70, 0x7818,
+ 0x9B40, 0x8828, 0xBD90, 0xAEF8, 0xD6E0, 0xC588, 0xF030, 0xE358,
+ 0xBD37, 0xAE5F, 0x9BE7, 0x888F, 0xF097, 0xE3FF, 0xD647, 0xC52F,
+ 0x2677, 0x351F, 0x00A7, 0x13CF, 0x6BD7, 0x78BF, 0x4D07, 0x5E6F,
+ 0xF1D9, 0xE2B1, 0xD709, 0xC461, 0xBC79, 0xAF11, 0x9AA9, 0x89C1,
+ 0x6A99, 0x79F1, 0x4C49, 0x5F21, 0x2739, 0x3451, 0x01E9, 0x1281,
+ 0x4CEE, 0x5F86, 0x6A3E, 0x7956, 0x014E, 0x1226, 0x279E, 0x34F6,
+ 0xD7AE, 0xC4C6, 0xF17E, 0xE216, 0x9A0E, 0x8966, 0xBCDE, 0xAFB6,
+ 0x6805, 0x7B6D, 0x4ED5, 0x5DBD, 0x25A5, 0x36CD, 0x0375, 0x101D,
+ 0xF345, 0xE02D, 0xD595, 0xC6FD, 0xBEE5, 0xAD8D, 0x9835, 0x8B5D,
+ 0xD532, 0xC65A, 0xF3E2, 0xE08A, 0x9892, 0x8BFA, 0xBE42, 0xAD2A,
+ 0x4E72, 0x5D1A, 0x68A2, 0x7BCA, 0x03D2, 0x10BA, 0x2502, 0x366A,
+ 0x99DC, 0x8AB4, 0xBF0C, 0xAC64, 0xD47C, 0xC714, 0xF2AC, 0xE1C4,
+ 0x029C, 0x11F4, 0x244C, 0x3724, 0x4F3C, 0x5C54, 0x69EC, 0x7A84,
+ 0x24EB, 0x3783, 0x023B, 0x1153, 0x694B, 0x7A23, 0x4F9B, 0x5CF3,
+ 0xBFAB, 0xACC3, 0x997B, 0x8A13, 0xF20B, 0xE163, 0xD4DB, 0xC7B3,
+ 0xD00A, 0xC362, 0xF6DA, 0xE5B2, 0x9DAA, 0x8EC2, 0xBB7A, 0xA812,
+ 0x4B4A, 0x5822, 0x6D9A, 0x7EF2, 0x06EA, 0x1582, 0x203A, 0x3352,
+ 0x6D3D, 0x7E55, 0x4BED, 0x5885, 0x209D, 0x33F5, 0x064D, 0x1525,
+ 0xF67D, 0xE515, 0xD0AD, 0xC3C5, 0xBBDD, 0xA8B5, 0x9D0D, 0x8E65,
+ 0x21D3, 0x32BB, 0x0703, 0x146B, 0x6C73, 0x7F1B, 0x4AA3, 0x59CB,
+ 0xBA93, 0xA9FB, 0x9C43, 0x8F2B, 0xF733, 0xE45B, 0xD1E3, 0xC28B,
+ 0x9CE4, 0x8F8C, 0xBA34, 0xA95C, 0xD144, 0xC22C, 0xF794, 0xE4FC,
+ 0x07A4, 0x14CC, 0x2174, 0x321C, 0x4A04, 0x596C, 0x6CD4, 0x7FBC,
+ 0xB80F, 0xAB67, 0x9EDF, 0x8DB7, 0xF5AF, 0xE6C7, 0xD37F, 0xC017,
+ 0x234F, 0x3027, 0x059F, 0x16F7, 0x6EEF, 0x7D87, 0x483F, 0x5B57,
+ 0x0538, 0x1650, 0x23E8, 0x3080, 0x4898, 0x5BF0, 0x6E48, 0x7D20,
+ 0x9E78, 0x8D10, 0xB8A8, 0xABC0, 0xD3D8, 0xC0B0, 0xF508, 0xE660,
+ 0x49D6, 0x5ABE, 0x6F06, 0x7C6E, 0x0476, 0x171E, 0x22A6, 0x31CE,
+ 0xD296, 0xC1FE, 0xF446, 0xE72E, 0x9F36, 0x8C5E, 0xB9E6, 0xAA8E,
+ 0xF4E1, 0xE789, 0xD231, 0xC159, 0xB941, 0xAA29, 0x9F91, 0x8CF9,
+ 0x6FA1, 0x7CC9, 0x4971, 0x5A19, 0x2201, 0x3169, 0x04D1, 0x17B9
+ },
+ {
+ 0x0000, 0x2BA3, 0x5746, 0x7CE5, 0xAE8C, 0x852F, 0xF9CA, 0xD269,
+ 0xD6AF, 0xFD0C, 0x81E9, 0xAA4A, 0x7823, 0x5380, 0x2F65, 0x04C6,
+ 0x26E9, 0x0D4A, 0x71AF, 0x5A0C, 0x8865, 0xA3C6, 0xDF23, 0xF480,
+ 0xF046, 0xDBE5, 0xA700, 0x8CA3, 0x5ECA, 0x7569, 0x098C, 0x222F,
+ 0x4DD2, 0x6671, 0x1A94, 0x3137, 0xE35E, 0xC8FD, 0xB418, 0x9FBB,
+ 0x9B7D, 0xB0DE, 0xCC3B, 0xE798, 0x35F1, 0x1E52, 0x62B7, 0x4914,
+ 0x6B3B, 0x4098, 0x3C7D, 0x17DE, 0xC5B7, 0xEE14, 0x92F1, 0xB952,
+ 0xBD94, 0x9637, 0xEAD2, 0xC171, 0x1318, 0x38BB, 0x445E, 0x6FFD,
+ 0x9BA4, 0xB007, 0xCCE2, 0xE741, 0x3528, 0x1E8B, 0x626E, 0x49CD,
+ 0x4D0B, 0x66A8, 0x1A4D, 0x31EE, 0xE387, 0xC824, 0xB4C1, 0x9F62,
+ 0xBD4D, 0x96EE, 0xEA0B, 0xC1A8, 0x13C1, 0x3862, 0x4487, 0x6F24,
+ 0x6BE2, 0x4041, 0x3CA4, 0x1707, 0xC56E, 0xEECD, 0x9228, 0xB98B,
+ 0xD676, 0xFDD5, 0x8130, 0xAA93, 0x78FA, 0x5359, 0x2FBC, 0x041F,
+ 0x00D9, 0x2B7A, 0x579F, 0x7C3C, 0xAE55, 0x85F6, 0xF913, 0xD2B0,
+ 0xF09F, 0xDB3C, 0xA7D9, 0x8C7A, 0x5E13, 0x75B0, 0x0955, 0x22F6,
+ 0x2630, 0x0D93, 0x7176, 0x5AD5, 0x88BC, 0xA31F, 0xDFFA, 0xF459,
+ 0xBCFF, 0x975C, 0xEBB9, 0xC01A, 0x1273, 0x39D0, 0x4535, 0x6E96,
+ 0x6A50, 0x41F3, 0x3D16, 0x16B5, 0xC4DC, 0xEF7F, 0x939A, 0xB839,
+ 0x9A16, 0xB1B5, 0xCD50, 0xE6F3, 0x349A, 0x1F39, 0x63DC, 0x487F,
+ 0x4CB9, 0x671A, 0x1BFF, 0x305C, 0xE235, 0xC996, 0xB573, 0x9ED0,
+ 0xF12D, 0xDA8E, 0xA66B, 0x8DC8, 0x5FA1, 0x7402, 0x08E7, 0x2344,
+ 0x2782, 0x0C21, 0x70C4, 0x5B67, 0x890E, 0xA2AD, 0xDE48, 0xF5EB,
+ 0xD7C4, 0xFC67, 0x8082, 0xAB21, 0x7948, 0x52EB, 0x2E0E, 0x05AD,
+ 0x016B, 0x2AC8, 0x562D, 0x7D8E, 0xAFE7, 0x8444, 0xF8A1, 0xD302,
+ 0x275B, 0x0CF8, 0x701D, 0x5BBE, 0x89D7, 0xA274, 0xDE91, 0xF532,
+ 0xF1F4, 0xDA57, 0xA6B2, 0x8D11, 0x5F78, 0x74DB, 0x083E, 0x239D,
+ 0x01B2, 0x2A11, 0x56F4, 0x7D57, 0xAF3E, 0x849D, 0xF878, 0xD3DB,
+ 0xD71D, 0xFCBE, 0x805B, 0xABF8, 0x7991, 0x5232, 0x2ED7, 0x0574,
+ 0x6A89, 0x412A, 0x3DCF, 0x166C, 0xC405, 0xEFA6, 0x9343, 0xB8E0,
+ 0xBC26, 0x9785, 0xEB60, 0xC0C3, 0x12AA, 0x3909, 0x45EC, 0x6E4F,
+ 0x4C60, 0x67C3, 0x1B26, 0x3085, 0xE2EC, 0xC94F, 0xB5AA, 0x9E09,
+ 0x9ACF, 0xB16C, 0xCD89, 0xE62A, 0x3443, 0x1FE0, 0x6305, 0x48A6
+ },
+ {
+ 0x0000, 0xF249, 0x6F25, 0x9D6C, 0xDE4A, 0x2C03, 0xB16F, 0x4326,
+ 0x3723, 0xC56A, 0x5806, 0xAA4F, 0xE969, 0x1B20, 0x864C, 0x7405,
+ 0x6E46, 0x9C0F, 0x0163, 0xF32A, 0xB00C, 0x4245, 0xDF29, 0x2D60,
+ 0x5965, 0xAB2C, 0x3640, 0xC409, 0x872F, 0x7566, 0xE80A, 0x1A43,
+ 0xDC8C, 0x2EC5, 0xB3A9, 0x41E0, 0x02C6, 0xF08F, 0x6DE3, 0x9FAA,
+ 0xEBAF, 0x19E6, 0x848A, 0x76C3, 0x35E5, 0xC7AC, 0x5AC0, 0xA889,
+ 0xB2CA, 0x4083, 0xDDEF, 0x2FA6, 0x6C80, 0x9EC9, 0x03A5, 0xF1EC,
+ 0x85E9, 0x77A0, 0xEACC, 0x1885, 0x5BA3, 0xA9EA, 0x3486, 0xC6CF,
+ 0x32AF, 0xC0E6, 0x5D8A, 0xAFC3, 0xECE5, 0x1EAC, 0x83C0, 0x7189,
+ 0x058C, 0xF7C5, 0x6AA9, 0x98E0, 0xDBC6, 0x298F, 0xB4E3, 0x46AA,
+ 0x5CE9, 0xAEA0, 0x33CC, 0xC185, 0x82A3, 0x70EA, 0xED86, 0x1FCF,
+ 0x6BCA, 0x9983, 0x04EF, 0xF6A6, 0xB580, 0x47C9, 0xDAA5, 0x28EC,
+ 0xEE23, 0x1C6A, 0x8106, 0x734F, 0x3069, 0xC220, 0x5F4C, 0xAD05,
+ 0xD900, 0x2B49, 0xB625, 0x446C, 0x074A, 0xF503, 0x686F, 0x9A26,
+ 0x8065, 0x722C, 0xEF40, 0x1D09, 0x5E2F, 0xAC66, 0x310A, 0xC343,
+ 0xB746, 0x450F, 0xD863, 0x2A2A, 0x690C, 0x9B45, 0x0629, 0xF460,
+ 0x655E, 0x9717, 0x0A7B, 0xF832, 0xBB14, 0x495D, 0xD431, 0x2678,
+ 0x527D, 0xA034, 0x3D58, 0xCF11, 0x8C37, 0x7E7E, 0xE312, 0x115B,
+ 0x0B18, 0xF951, 0x643D, 0x9674, 0xD552, 0x271B, 0xBA77, 0x483E,
+ 0x3C3B, 0xCE72, 0x531E, 0xA157, 0xE271, 0x1038, 0x8D54, 0x7F1D,
+ 0xB9D2, 0x4B9B, 0xD6F7, 0x24BE, 0x6798, 0x95D1, 0x08BD, 0xFAF4,
+ 0x8EF1, 0x7CB8, 0xE1D4, 0x139D, 0x50BB, 0xA2F2, 0x3F9E, 0xCDD7,
+ 0xD794, 0x25DD, 0xB8B1, 0x4AF8, 0x09DE, 0xFB97, 0x66FB, 0x94B2,
+ 0xE0B7, 0x12FE, 0x8F92, 0x7DDB, 0x3EFD, 0xCCB4, 0x51D8, 0xA391,
+ 0x57F1, 0xA5B8, 0x38D4, 0xCA9D, 0x89BB, 0x7BF2, 0xE69E, 0x14D7,
+ 0x60D2, 0x929B, 0x0FF7, 0xFDBE, 0xBE98, 0x4CD1, 0xD1BD, 0x23F4,
+ 0x39B7, 0xCBFE, 0x5692, 0xA4DB, 0xE7FD, 0x15B4, 0x88D8, 0x7A91,
+ 0x0E94, 0xFCDD, 0x61B1, 0x93F8, 0xD0DE, 0x2297, 0xBFFB, 0x4DB2,
+ 0x8B7D, 0x7934, 0xE458, 0x1611, 0x5537, 0xA77E, 0x3A12, 0xC85B,
+ 0xBC5E, 0x4E17, 0xD37B, 0x2132, 0x6214, 0x905D, 0x0D31, 0xFF78,
+ 0xE53B, 0x1772, 0x8A1E, 0x7857, 0x3B71, 0xC938, 0x5454, 0xA61D,
+ 0xD218, 0x2051, 0xBD3D, 0x4F74, 0x0C52, 0xFE1B, 0x6377, 0x913E
+ },
+ {
+ 0x0000, 0xCABC, 0x1ECF, 0xD473, 0x3D9E, 0xF722, 0x2351, 0xE9ED,
+ 0x7B3C, 0xB180, 0x65F3, 0xAF4F, 0x46A2, 0x8C1E, 0x586D, 0x92D1,
+ 0xF678, 0x3CC4, 0xE8B7, 0x220B, 0xCBE6, 0x015A, 0xD529, 0x1F95,
+ 0x8D44, 0x47F8, 0x938B, 0x5937, 0xB0DA, 0x7A66, 0xAE15, 0x64A9,
+ 0x6747, 0xADFB, 0x7988, 0xB334, 0x5AD9, 0x9065, 0x4416, 0x8EAA,
+ 0x1C7B, 0xD6C7, 0x02B4, 0xC808, 0x21E5, 0xEB59, 0x3F2A, 0xF596,
+ 0x913F, 0x5B83, 0x8FF0, 0x454C, 0xACA1, 0x661D, 0xB26E, 0x78D2,
+ 0xEA03, 0x20BF, 0xF4CC, 0x3E70, 0xD79D, 0x1D21, 0xC952, 0x03EE,
+ 0xCE8E, 0x0432, 0xD041, 0x1AFD, 0xF310, 0x39AC, 0xEDDF, 0x2763,
+ 0xB5B2, 0x7F0E, 0xAB7D, 0x61C1, 0x882C, 0x4290, 0x96E3, 0x5C5F,
+ 0x38F6, 0xF24A, 0x2639, 0xEC85, 0x0568, 0xCFD4, 0x1BA7, 0xD11B,
+ 0x43CA, 0x8976, 0x5D05, 0x97B9, 0x7E54, 0xB4E8, 0x609B, 0xAA27,
+ 0xA9C9, 0x6375, 0xB706, 0x7DBA, 0x9457, 0x5EEB, 0x8A98, 0x4024,
+ 0xD2F5, 0x1849, 0xCC3A, 0x0686, 0xEF6B, 0x25D7, 0xF1A4, 0x3B18,
+ 0x5FB1, 0x950D, 0x417E, 0x8BC2, 0x622F, 0xA893, 0x7CE0, 0xB65C,
+ 0x248D, 0xEE31, 0x3A42, 0xF0FE, 0x1913, 0xD3AF, 0x07DC, 0xCD60,
+ 0x16AB, 0xDC17, 0x0864, 0xC2D8, 0x2B35, 0xE189, 0x35FA, 0xFF46,
+ 0x6D97, 0xA72B, 0x7358, 0xB9E4, 0x5009, 0x9AB5, 0x4EC6, 0x847A,
+ 0xE0D3, 0x2A6F, 0xFE1C, 0x34A0, 0xDD4D, 0x17F1, 0xC382, 0x093E,
+ 0x9BEF, 0x5153, 0x8520, 0x4F9C, 0xA671, 0x6CCD, 0xB8BE, 0x7202,
+ 0x71EC, 0xBB50, 0x6F23, 0xA59F, 0x4C72, 0x86CE, 0x52BD, 0x9801,
+ 0x0AD0, 0xC06C, 0x141F, 0xDEA3, 0x374E, 0xFDF2, 0x2981, 0xE33D,
+ 0x8794, 0x4D28, 0x995B, 0x53E7, 0xBA0A, 0x70B6, 0xA4C5, 0x6E79,
+ 0xFCA8, 0x3614, 0xE267, 0x28DB, 0xC136, 0x0B8A, 0xDFF9, 0x1545,
+ 0xD825, 0x1299, 0xC6EA, 0x0C56, 0xE5BB, 0x2F07, 0xFB74, 0x31C8,
+ 0xA319, 0x69A5, 0xBDD6, 0x776A, 0x9E87, 0x543B, 0x8048, 0x4AF4,
+ 0x2E5D, 0xE4E1, 0x3092, 0xFA2E, 0x13C3, 0xD97F, 0x0D0C, 0xC7B0,
+ 0x5561, 0x9FDD, 0x4BAE, 0x8112, 0x68FF, 0xA243, 0x7630, 0xBC8C,
+ 0xBF62, 0x75DE, 0xA1AD, 0x6B11, 0x82FC, 0x4840, 0x9C33, 0x568F,
+ 0xC45E, 0x0EE2, 0xDA91, 0x102D, 0xF9C0, 0x337C, 0xE70F, 0x2DB3,
+ 0x491A, 0x83A6, 0x57D5, 0x9D69, 0x7484, 0xBE38, 0x6A4B, 0xA0F7,
+ 0x3226, 0xF89A, 0x2CE9, 0xE655, 0x0FB8, 0xC504, 0x1177, 0xDBCB
+ },
+#endif
+#if CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE >= 5
+ {
+ 0x0000, 0x2D56, 0x5AAC, 0x77FA, 0xB558, 0x980E, 0xEFF4, 0xC2A2,
+ 0xE107, 0xCC51, 0xBBAB, 0x96FD, 0x545F, 0x7909, 0x0EF3, 0x23A5,
+ 0x49B9, 0x64EF, 0x1315, 0x3E43, 0xFCE1, 0xD1B7, 0xA64D, 0x8B1B,
+ 0xA8BE, 0x85E8, 0xF212, 0xDF44, 0x1DE6, 0x30B0, 0x474A, 0x6A1C,
+ 0x9372, 0xBE24, 0xC9DE, 0xE488, 0x262A, 0x0B7C, 0x7C86, 0x51D0,
+ 0x7275, 0x5F23, 0x28D9, 0x058F, 0xC72D, 0xEA7B, 0x9D81, 0xB0D7,
+ 0xDACB, 0xF79D, 0x8067, 0xAD31, 0x6F93, 0x42C5, 0x353F, 0x1869,
+ 0x3BCC, 0x169A, 0x6160, 0x4C36, 0x8E94, 0xA3C2, 0xD438, 0xF96E,
+ 0xAD53, 0x8005, 0xF7FF, 0xDAA9, 0x180B, 0x355D, 0x42A7, 0x6FF1,
+ 0x4C54, 0x6102, 0x16F8, 0x3BAE, 0xF90C, 0xD45A, 0xA3A0, 0x8EF6,
+ 0xE4EA, 0xC9BC, 0xBE46, 0x9310, 0x51B2, 0x7CE4, 0x0B1E, 0x2648,
+ 0x05ED, 0x28BB, 0x5F41, 0x7217, 0xB0B5, 0x9DE3, 0xEA19, 0xC74F,
+ 0x3E21, 0x1377, 0x648D, 0x49DB, 0x8B79, 0xA62F, 0xD1D5, 0xFC83,
+ 0xDF26, 0xF270, 0x858A, 0xA8DC, 0x6A7E, 0x4728, 0x30D2, 0x1D84,
+ 0x7798, 0x5ACE, 0x2D34, 0x0062, 0xC2C0, 0xEF96, 0x986C, 0xB53A,
+ 0x969F, 0xBBC9, 0xCC33, 0xE165, 0x23C7, 0x0E91, 0x796B, 0x543D,
+ 0xD111, 0xFC47, 0x8BBD, 0xA6EB, 0x6449, 0x491F, 0x3EE5, 0x13B3,
+ 0x3016, 0x1D40, 0x6ABA, 0x47EC, 0x854E, 0xA818, 0xDFE2, 0xF2B4,
+ 0x98A8, 0xB5FE, 0xC204, 0xEF52, 0x2DF0, 0x00A6, 0x775C, 0x5A0A,
+ 0x79AF, 0x54F9, 0x2303, 0x0E55, 0xCCF7, 0xE1A1, 0x965B, 0xBB0D,
+ 0x4263, 0x6F35, 0x18CF, 0x3599, 0xF73B, 0xDA6D, 0xAD97, 0x80C1,
+ 0xA364, 0x8E32, 0xF9C8, 0xD49E, 0x163C, 0x3B6A, 0x4C90, 0x61C6,
+ 0x0BDA, 0x268C, 0x5176, 0x7C20, 0xBE82, 0x93D4, 0xE42E, 0xC978,
+ 0xEADD, 0xC78B, 0xB071, 0x9D27, 0x5F85, 0x72D3, 0x0529, 0x287F,
+ 0x7C42, 0x5114, 0x26EE, 0x0BB8, 0xC91A, 0xE44C, 0x93B6, 0xBEE0,
+ 0x9D45, 0xB013, 0xC7E9, 0xEABF, 0x281D, 0x054B, 0x72B1, 0x5FE7,
+ 0x35FB, 0x18AD, 0x6F57, 0x4201, 0x80A3, 0xADF5, 0xDA0F, 0xF759,
+ 0xD4FC, 0xF9AA, 0x8E50, 0xA306, 0x61A4, 0x4CF2, 0x3B08, 0x165E,
+ 0xEF30, 0xC266, 0xB59C, 0x98CA, 0x5A68, 0x773E, 0x00C4, 0x2D92,
+ 0x0E37, 0x2361, 0x549B, 0x79CD, 0xBB6F, 0x9639, 0xE1C3, 0xCC95,
+ 0xA689, 0x8BDF, 0xFC25, 0xD173, 0x13D1, 0x3E87, 0x497D, 0x642B,
+ 0x478E, 0x6AD8, 0x1D22, 0x3074, 0xF2D6, 0xDF80, 0xA87A, 0x852C
+ },
+ {
+ 0x0000, 0x2995, 0x532A, 0x7ABF, 0xA654, 0x8FC1, 0xF57E, 0xDCEB,
+ 0xC71F, 0xEE8A, 0x9435, 0xBDA0, 0x614B, 0x48DE, 0x3261, 0x1BF4,
+ 0x0589, 0x2C1C, 0x56A3, 0x7F36, 0xA3DD, 0x8A48, 0xF0F7, 0xD962,
+ 0xC296, 0xEB03, 0x91BC, 0xB829, 0x64C2, 0x4D57, 0x37E8, 0x1E7D,
+ 0x0B12, 0x2287, 0x5838, 0x71AD, 0xAD46, 0x84D3, 0xFE6C, 0xD7F9,
+ 0xCC0D, 0xE598, 0x9F27, 0xB6B2, 0x6A59, 0x43CC, 0x3973, 0x10E6,
+ 0x0E9B, 0x270E, 0x5DB1, 0x7424, 0xA8CF, 0x815A, 0xFBE5, 0xD270,
+ 0xC984, 0xE011, 0x9AAE, 0xB33B, 0x6FD0, 0x4645, 0x3CFA, 0x156F,
+ 0x1624, 0x3FB1, 0x450E, 0x6C9B, 0xB070, 0x99E5, 0xE35A, 0xCACF,
+ 0xD13B, 0xF8AE, 0x8211, 0xAB84, 0x776F, 0x5EFA, 0x2445, 0x0DD0,
+ 0x13AD, 0x3A38, 0x4087, 0x6912, 0xB5F9, 0x9C6C, 0xE6D3, 0xCF46,
+ 0xD4B2, 0xFD27, 0x8798, 0xAE0D, 0x72E6, 0x5B73, 0x21CC, 0x0859,
+ 0x1D36, 0x34A3, 0x4E1C, 0x6789, 0xBB62, 0x92F7, 0xE848, 0xC1DD,
+ 0xDA29, 0xF3BC, 0x8903, 0xA096, 0x7C7D, 0x55E8, 0x2F57, 0x06C2,
+ 0x18BF, 0x312A, 0x4B95, 0x6200, 0xBEEB, 0x977E, 0xEDC1, 0xC454,
+ 0xDFA0, 0xF635, 0x8C8A, 0xA51F, 0x79F4, 0x5061, 0x2ADE, 0x034B,
+ 0x2C48, 0x05DD, 0x7F62, 0x56F7, 0x8A1C, 0xA389, 0xD936, 0xF0A3,
+ 0xEB57, 0xC2C2, 0xB87D, 0x91E8, 0x4D03, 0x6496, 0x1E29, 0x37BC,
+ 0x29C1, 0x0054, 0x7AEB, 0x537E, 0x8F95, 0xA600, 0xDCBF, 0xF52A,
+ 0xEEDE, 0xC74B, 0xBDF4, 0x9461, 0x488A, 0x611F, 0x1BA0, 0x3235,
+ 0x275A, 0x0ECF, 0x7470, 0x5DE5, 0x810E, 0xA89B, 0xD224, 0xFBB1,
+ 0xE045, 0xC9D0, 0xB36F, 0x9AFA, 0x4611, 0x6F84, 0x153B, 0x3CAE,
+ 0x22D3, 0x0B46, 0x71F9, 0x586C, 0x8487, 0xAD12, 0xD7AD, 0xFE38,
+ 0xE5CC, 0xCC59, 0xB6E6, 0x9F73, 0x4398, 0x6A0D, 0x10B2, 0x3927,
+ 0x3A6C, 0x13F9, 0x6946, 0x40D3, 0x9C38, 0xB5AD, 0xCF12, 0xE687,
+ 0xFD73, 0xD4E6, 0xAE59, 0x87CC, 0x5B27, 0x72B2, 0x080D, 0x2198,
+ 0x3FE5, 0x1670, 0x6CCF, 0x455A, 0x99B1, 0xB024, 0xCA9B, 0xE30E,
+ 0xF8FA, 0xD16F, 0xABD0, 0x8245, 0x5EAE, 0x773B, 0x0D84, 0x2411,
+ 0x317E, 0x18EB, 0x6254, 0x4BC1, 0x972A, 0xBEBF, 0xC400, 0xED95,
+ 0xF661, 0xDFF4, 0xA54B, 0x8CDE, 0x5035, 0x79A0, 0x031F, 0x2A8A,
+ 0x34F7, 0x1D62, 0x67DD, 0x4E48, 0x92A3, 0xBB36, 0xC189, 0xE81C,
+ 0xF3E8, 0xDA7D, 0xA0C2, 0x8957, 0x55BC, 0x7C29, 0x0696, 0x2F03
+ },
+ {
+ 0x0000, 0x5890, 0xB120, 0xE9B0, 0xE9F7, 0xB167, 0x58D7, 0x0047,
+ 0x5859, 0x00C9, 0xE979, 0xB1E9, 0xB1AE, 0xE93E, 0x008E, 0x581E,
+ 0xB0B2, 0xE822, 0x0192, 0x5902, 0x5945, 0x01D5, 0xE865, 0xB0F5,
+ 0xE8EB, 0xB07B, 0x59CB, 0x015B, 0x011C, 0x598C, 0xB03C, 0xE8AC,
+ 0xEAD3, 0xB243, 0x5BF3, 0x0363, 0x0324, 0x5BB4, 0xB204, 0xEA94,
+ 0xB28A, 0xEA1A, 0x03AA, 0x5B3A, 0x5B7D, 0x03ED, 0xEA5D, 0xB2CD,
+ 0x5A61, 0x02F1, 0xEB41, 0xB3D1, 0xB396, 0xEB06, 0x02B6, 0x5A26,
+ 0x0238, 0x5AA8, 0xB318, 0xEB88, 0xEBCF, 0xB35F, 0x5AEF, 0x027F,
+ 0x5E11, 0x0681, 0xEF31, 0xB7A1, 0xB7E6, 0xEF76, 0x06C6, 0x5E56,
+ 0x0648, 0x5ED8, 0xB768, 0xEFF8, 0xEFBF, 0xB72F, 0x5E9F, 0x060F,
+ 0xEEA3, 0xB633, 0x5F83, 0x0713, 0x0754, 0x5FC4, 0xB674, 0xEEE4,
+ 0xB6FA, 0xEE6A, 0x07DA, 0x5F4A, 0x5F0D, 0x079D, 0xEE2D, 0xB6BD,
+ 0xB4C2, 0xEC52, 0x05E2, 0x5D72, 0x5D35, 0x05A5, 0xEC15, 0xB485,
+ 0xEC9B, 0xB40B, 0x5DBB, 0x052B, 0x056C, 0x5DFC, 0xB44C, 0xECDC,
+ 0x0470, 0x5CE0, 0xB550, 0xEDC0, 0xED87, 0xB517, 0x5CA7, 0x0437,
+ 0x5C29, 0x04B9, 0xED09, 0xB599, 0xB5DE, 0xED4E, 0x04FE, 0x5C6E,
+ 0xBC22, 0xE4B2, 0x0D02, 0x5592, 0x55D5, 0x0D45, 0xE4F5, 0xBC65,
+ 0xE47B, 0xBCEB, 0x555B, 0x0DCB, 0x0D8C, 0x551C, 0xBCAC, 0xE43C,
+ 0x0C90, 0x5400, 0xBDB0, 0xE520, 0xE567, 0xBDF7, 0x5447, 0x0CD7,
+ 0x54C9, 0x0C59, 0xE5E9, 0xBD79, 0xBD3E, 0xE5AE, 0x0C1E, 0x548E,
+ 0x56F1, 0x0E61, 0xE7D1, 0xBF41, 0xBF06, 0xE796, 0x0E26, 0x56B6,
+ 0x0EA8, 0x5638, 0xBF88, 0xE718, 0xE75F, 0xBFCF, 0x567F, 0x0EEF,
+ 0xE643, 0xBED3, 0x5763, 0x0FF3, 0x0FB4, 0x5724, 0xBE94, 0xE604,
+ 0xBE1A, 0xE68A, 0x0F3A, 0x57AA, 0x57ED, 0x0F7D, 0xE6CD, 0xBE5D,
+ 0xE233, 0xBAA3, 0x5313, 0x0B83, 0x0BC4, 0x5354, 0xBAE4, 0xE274,
+ 0xBA6A, 0xE2FA, 0x0B4A, 0x53DA, 0x539D, 0x0B0D, 0xE2BD, 0xBA2D,
+ 0x5281, 0x0A11, 0xE3A1, 0xBB31, 0xBB76, 0xE3E6, 0x0A56, 0x52C6,
+ 0x0AD8, 0x5248, 0xBBF8, 0xE368, 0xE32F, 0xBBBF, 0x520F, 0x0A9F,
+ 0x08E0, 0x5070, 0xB9C0, 0xE150, 0xE117, 0xB987, 0x5037, 0x08A7,
+ 0x50B9, 0x0829, 0xE199, 0xB909, 0xB94E, 0xE1DE, 0x086E, 0x50FE,
+ 0xB852, 0xE0C2, 0x0972, 0x51E2, 0x51A5, 0x0935, 0xE085, 0xB815,
+ 0xE00B, 0xB89B, 0x512B, 0x09BB, 0x09FC, 0x516C, 0xB8DC, 0xE04C
+ },
+ {
+ 0x0000, 0xF3F3, 0x6C51, 0x9FA2, 0xD8A2, 0x2B51, 0xB4F3, 0x4700,
+ 0x3AF3, 0xC900, 0x56A2, 0xA551, 0xE251, 0x11A2, 0x8E00, 0x7DF3,
+ 0x75E6, 0x8615, 0x19B7, 0xEA44, 0xAD44, 0x5EB7, 0xC115, 0x32E6,
+ 0x4F15, 0xBCE6, 0x2344, 0xD0B7, 0x97B7, 0x6444, 0xFBE6, 0x0815,
+ 0xEBCC, 0x183F, 0x879D, 0x746E, 0x336E, 0xC09D, 0x5F3F, 0xACCC,
+ 0xD13F, 0x22CC, 0xBD6E, 0x4E9D, 0x099D, 0xFA6E, 0x65CC, 0x963F,
+ 0x9E2A, 0x6DD9, 0xF27B, 0x0188, 0x4688, 0xB57B, 0x2AD9, 0xD92A,
+ 0xA4D9, 0x572A, 0xC888, 0x3B7B, 0x7C7B, 0x8F88, 0x102A, 0xE3D9,
+ 0x5C2F, 0xAFDC, 0x307E, 0xC38D, 0x848D, 0x777E, 0xE8DC, 0x1B2F,
+ 0x66DC, 0x952F, 0x0A8D, 0xF97E, 0xBE7E, 0x4D8D, 0xD22F, 0x21DC,
+ 0x29C9, 0xDA3A, 0x4598, 0xB66B, 0xF16B, 0x0298, 0x9D3A, 0x6EC9,
+ 0x133A, 0xE0C9, 0x7F6B, 0x8C98, 0xCB98, 0x386B, 0xA7C9, 0x543A,
+ 0xB7E3, 0x4410, 0xDBB2, 0x2841, 0x6F41, 0x9CB2, 0x0310, 0xF0E3,
+ 0x8D10, 0x7EE3, 0xE141, 0x12B2, 0x55B2, 0xA641, 0x39E3, 0xCA10,
+ 0xC205, 0x31F6, 0xAE54, 0x5DA7, 0x1AA7, 0xE954, 0x76F6, 0x8505,
+ 0xF8F6, 0x0B05, 0x94A7, 0x6754, 0x2054, 0xD3A7, 0x4C05, 0xBFF6,
+ 0xB85E, 0x4BAD, 0xD40F, 0x27FC, 0x60FC, 0x930F, 0x0CAD, 0xFF5E,
+ 0x82AD, 0x715E, 0xEEFC, 0x1D0F, 0x5A0F, 0xA9FC, 0x365E, 0xC5AD,
+ 0xCDB8, 0x3E4B, 0xA1E9, 0x521A, 0x151A, 0xE6E9, 0x794B, 0x8AB8,
+ 0xF74B, 0x04B8, 0x9B1A, 0x68E9, 0x2FE9, 0xDC1A, 0x43B8, 0xB04B,
+ 0x5392, 0xA061, 0x3FC3, 0xCC30, 0x8B30, 0x78C3, 0xE761, 0x1492,
+ 0x6961, 0x9A92, 0x0530, 0xF6C3, 0xB1C3, 0x4230, 0xDD92, 0x2E61,
+ 0x2674, 0xD587, 0x4A25, 0xB9D6, 0xFED6, 0x0D25, 0x9287, 0x6174,
+ 0x1C87, 0xEF74, 0x70D6, 0x8325, 0xC425, 0x37D6, 0xA874, 0x5B87,
+ 0xE471, 0x1782, 0x8820, 0x7BD3, 0x3CD3, 0xCF20, 0x5082, 0xA371,
+ 0xDE82, 0x2D71, 0xB2D3, 0x4120, 0x0620, 0xF5D3, 0x6A71, 0x9982,
+ 0x9197, 0x6264, 0xFDC6, 0x0E35, 0x4935, 0xBAC6, 0x2564, 0xD697,
+ 0xAB64, 0x5897, 0xC735, 0x34C6, 0x73C6, 0x8035, 0x1F97, 0xEC64,
+ 0x0FBD, 0xFC4E, 0x63EC, 0x901F, 0xD71F, 0x24EC, 0xBB4E, 0x48BD,
+ 0x354E, 0xC6BD, 0x591F, 0xAAEC, 0xEDEC, 0x1E1F, 0x81BD, 0x724E,
+ 0x7A5B, 0x89A8, 0x160A, 0xE5F9, 0xA2F9, 0x510A, 0xCEA8, 0x3D5B,
+ 0x40A8, 0xB35B, 0x2CF9, 0xDF0A, 0x980A, 0x6BF9, 0xF45B, 0x07A8
+ },
+ {
+ 0x0000, 0xFB0B, 0x7DA1, 0x86AA, 0xFB42, 0x0049, 0x86E3, 0x7DE8,
+ 0x7D33, 0x8638, 0x0092, 0xFB99, 0x8671, 0x7D7A, 0xFBD0, 0x00DB,
+ 0xFA66, 0x016D, 0x87C7, 0x7CCC, 0x0124, 0xFA2F, 0x7C85, 0x878E,
+ 0x8755, 0x7C5E, 0xFAF4, 0x01FF, 0x7C17, 0x871C, 0x01B6, 0xFABD,
+ 0x7F7B, 0x8470, 0x02DA, 0xF9D1, 0x8439, 0x7F32, 0xF998, 0x0293,
+ 0x0248, 0xF943, 0x7FE9, 0x84E2, 0xF90A, 0x0201, 0x84AB, 0x7FA0,
+ 0x851D, 0x7E16, 0xF8BC, 0x03B7, 0x7E5F, 0x8554, 0x03FE, 0xF8F5,
+ 0xF82E, 0x0325, 0x858F, 0x7E84, 0x036C, 0xF867, 0x7ECD, 0x85C6,
+ 0xFEF6, 0x05FD, 0x8357, 0x785C, 0x05B4, 0xFEBF, 0x7815, 0x831E,
+ 0x83C5, 0x78CE, 0xFE64, 0x056F, 0x7887, 0x838C, 0x0526, 0xFE2D,
+ 0x0490, 0xFF9B, 0x7931, 0x823A, 0xFFD2, 0x04D9, 0x8273, 0x7978,
+ 0x79A3, 0x82A8, 0x0402, 0xFF09, 0x82E1, 0x79EA, 0xFF40, 0x044B,
+ 0x818D, 0x7A86, 0xFC2C, 0x0727, 0x7ACF, 0x81C4, 0x076E, 0xFC65,
+ 0xFCBE, 0x07B5, 0x811F, 0x7A14, 0x07FC, 0xFCF7, 0x7A5D, 0x8156,
+ 0x7BEB, 0x80E0, 0x064A, 0xFD41, 0x80A9, 0x7BA2, 0xFD08, 0x0603,
+ 0x06D8, 0xFDD3, 0x7B79, 0x8072, 0xFD9A, 0x0691, 0x803B, 0x7B30,
+ 0x765B, 0x8D50, 0x0BFA, 0xF0F1, 0x8D19, 0x7612, 0xF0B8, 0x0BB3,
+ 0x0B68, 0xF063, 0x76C9, 0x8DC2, 0xF02A, 0x0B21, 0x8D8B, 0x7680,
+ 0x8C3D, 0x7736, 0xF19C, 0x0A97, 0x777F, 0x8C74, 0x0ADE, 0xF1D5,
+ 0xF10E, 0x0A05, 0x8CAF, 0x77A4, 0x0A4C, 0xF147, 0x77ED, 0x8CE6,
+ 0x0920, 0xF22B, 0x7481, 0x8F8A, 0xF262, 0x0969, 0x8FC3, 0x74C8,
+ 0x7413, 0x8F18, 0x09B2, 0xF2B9, 0x8F51, 0x745A, 0xF2F0, 0x09FB,
+ 0xF346, 0x084D, 0x8EE7, 0x75EC, 0x0804, 0xF30F, 0x75A5, 0x8EAE,
+ 0x8E75, 0x757E, 0xF3D4, 0x08DF, 0x7537, 0x8E3C, 0x0896, 0xF39D,
+ 0x88AD, 0x73A6, 0xF50C, 0x0E07, 0x73EF, 0x88E4, 0x0E4E, 0xF545,
+ 0xF59E, 0x0E95, 0x883F, 0x7334, 0x0EDC, 0xF5D7, 0x737D, 0x8876,
+ 0x72CB, 0x89C0, 0x0F6A, 0xF461, 0x8989, 0x7282, 0xF428, 0x0F23,
+ 0x0FF8, 0xF4F3, 0x7259, 0x8952, 0xF4BA, 0x0FB1, 0x891B, 0x7210,
+ 0xF7D6, 0x0CDD, 0x8A77, 0x717C, 0x0C94, 0xF79F, 0x7135, 0x8A3E,
+ 0x8AE5, 0x71EE, 0xF744, 0x0C4F, 0x71A7, 0x8AAC, 0x0C06, 0xF70D,
+ 0x0DB0, 0xF6BB, 0x7011, 0x8B1A, 0xF6F2, 0x0DF9, 0x8B53, 0x7058,
+ 0x7083, 0x8B88, 0x0D22, 0xF629, 0x8BC1, 0x70CA, 0xF660, 0x0D6B
+ },
+ {
+ 0x0000, 0xECB6, 0x52DB, 0xBE6D, 0xA5B6, 0x4900, 0xF76D, 0x1BDB,
+ 0xC0DB, 0x2C6D, 0x9200, 0x7EB6, 0x656D, 0x89DB, 0x37B6, 0xDB00,
+ 0x0A01, 0xE6B7, 0x58DA, 0xB46C, 0xAFB7, 0x4301, 0xFD6C, 0x11DA,
+ 0xCADA, 0x266C, 0x9801, 0x74B7, 0x6F6C, 0x83DA, 0x3DB7, 0xD101,
+ 0x1402, 0xF8B4, 0x46D9, 0xAA6F, 0xB1B4, 0x5D02, 0xE36F, 0x0FD9,
+ 0xD4D9, 0x386F, 0x8602, 0x6AB4, 0x716F, 0x9DD9, 0x23B4, 0xCF02,
+ 0x1E03, 0xF2B5, 0x4CD8, 0xA06E, 0xBBB5, 0x5703, 0xE96E, 0x05D8,
+ 0xDED8, 0x326E, 0x8C03, 0x60B5, 0x7B6E, 0x97D8, 0x29B5, 0xC503,
+ 0x2804, 0xC4B2, 0x7ADF, 0x9669, 0x8DB2, 0x6104, 0xDF69, 0x33DF,
+ 0xE8DF, 0x0469, 0xBA04, 0x56B2, 0x4D69, 0xA1DF, 0x1FB2, 0xF304,
+ 0x2205, 0xCEB3, 0x70DE, 0x9C68, 0x87B3, 0x6B05, 0xD568, 0x39DE,
+ 0xE2DE, 0x0E68, 0xB005, 0x5CB3, 0x4768, 0xABDE, 0x15B3, 0xF905,
+ 0x3C06, 0xD0B0, 0x6EDD, 0x826B, 0x99B0, 0x7506, 0xCB6B, 0x27DD,
+ 0xFCDD, 0x106B, 0xAE06, 0x42B0, 0x596B, 0xB5DD, 0x0BB0, 0xE706,
+ 0x3607, 0xDAB1, 0x64DC, 0x886A, 0x93B1, 0x7F07, 0xC16A, 0x2DDC,
+ 0xF6DC, 0x1A6A, 0xA407, 0x48B1, 0x536A, 0xBFDC, 0x01B1, 0xED07,
+ 0x5008, 0xBCBE, 0x02D3, 0xEE65, 0xF5BE, 0x1908, 0xA765, 0x4BD3,
+ 0x90D3, 0x7C65, 0xC208, 0x2EBE, 0x3565, 0xD9D3, 0x67BE, 0x8B08,
+ 0x5A09, 0xB6BF, 0x08D2, 0xE464, 0xFFBF, 0x1309, 0xAD64, 0x41D2,
+ 0x9AD2, 0x7664, 0xC809, 0x24BF, 0x3F64, 0xD3D2, 0x6DBF, 0x8109,
+ 0x440A, 0xA8BC, 0x16D1, 0xFA67, 0xE1BC, 0x0D0A, 0xB367, 0x5FD1,
+ 0x84D1, 0x6867, 0xD60A, 0x3ABC, 0x2167, 0xCDD1, 0x73BC, 0x9F0A,
+ 0x4E0B, 0xA2BD, 0x1CD0, 0xF066, 0xEBBD, 0x070B, 0xB966, 0x55D0,
+ 0x8ED0, 0x6266, 0xDC0B, 0x30BD, 0x2B66, 0xC7D0, 0x79BD, 0x950B,
+ 0x780C, 0x94BA, 0x2AD7, 0xC661, 0xDDBA, 0x310C, 0x8F61, 0x63D7,
+ 0xB8D7, 0x5461, 0xEA0C, 0x06BA, 0x1D61, 0xF1D7, 0x4FBA, 0xA30C,
+ 0x720D, 0x9EBB, 0x20D6, 0xCC60, 0xD7BB, 0x3B0D, 0x8560, 0x69D6,
+ 0xB2D6, 0x5E60, 0xE00D, 0x0CBB, 0x1760, 0xFBD6, 0x45BB, 0xA90D,
+ 0x6C0E, 0x80B8, 0x3ED5, 0xD263, 0xC9B8, 0x250E, 0x9B63, 0x77D5,
+ 0xACD5, 0x4063, 0xFE0E, 0x12B8, 0x0963, 0xE5D5, 0x5BB8, 0xB70E,
+ 0x660F, 0x8AB9, 0x34D4, 0xD862, 0xC3B9, 0x2F0F, 0x9162, 0x7DD4,
+ 0xA6D4, 0x4A62, 0xF40F, 0x18B9, 0x0362, 0xEFD4, 0x51B9, 0xBD0F
+ },
+ {
+ 0x0000, 0xA010, 0xCB97, 0x6B87, 0x1C99, 0xBC89, 0xD70E, 0x771E,
+ 0x3932, 0x9922, 0xF2A5, 0x52B5, 0x25AB, 0x85BB, 0xEE3C, 0x4E2C,
+ 0x7264, 0xD274, 0xB9F3, 0x19E3, 0x6EFD, 0xCEED, 0xA56A, 0x057A,
+ 0x4B56, 0xEB46, 0x80C1, 0x20D1, 0x57CF, 0xF7DF, 0x9C58, 0x3C48,
+ 0xE4C8, 0x44D8, 0x2F5F, 0x8F4F, 0xF851, 0x5841, 0x33C6, 0x93D6,
+ 0xDDFA, 0x7DEA, 0x166D, 0xB67D, 0xC163, 0x6173, 0x0AF4, 0xAAE4,
+ 0x96AC, 0x36BC, 0x5D3B, 0xFD2B, 0x8A35, 0x2A25, 0x41A2, 0xE1B2,
+ 0xAF9E, 0x0F8E, 0x6409, 0xC419, 0xB307, 0x1317, 0x7890, 0xD880,
+ 0x4227, 0xE237, 0x89B0, 0x29A0, 0x5EBE, 0xFEAE, 0x9529, 0x3539,
+ 0x7B15, 0xDB05, 0xB082, 0x1092, 0x678C, 0xC79C, 0xAC1B, 0x0C0B,
+ 0x3043, 0x9053, 0xFBD4, 0x5BC4, 0x2CDA, 0x8CCA, 0xE74D, 0x475D,
+ 0x0971, 0xA961, 0xC2E6, 0x62F6, 0x15E8, 0xB5F8, 0xDE7F, 0x7E6F,
+ 0xA6EF, 0x06FF, 0x6D78, 0xCD68, 0xBA76, 0x1A66, 0x71E1, 0xD1F1,
+ 0x9FDD, 0x3FCD, 0x544A, 0xF45A, 0x8344, 0x2354, 0x48D3, 0xE8C3,
+ 0xD48B, 0x749B, 0x1F1C, 0xBF0C, 0xC812, 0x6802, 0x0385, 0xA395,
+ 0xEDB9, 0x4DA9, 0x262E, 0x863E, 0xF120, 0x5130, 0x3AB7, 0x9AA7,
+ 0x844E, 0x245E, 0x4FD9, 0xEFC9, 0x98D7, 0x38C7, 0x5340, 0xF350,
+ 0xBD7C, 0x1D6C, 0x76EB, 0xD6FB, 0xA1E5, 0x01F5, 0x6A72, 0xCA62,
+ 0xF62A, 0x563A, 0x3DBD, 0x9DAD, 0xEAB3, 0x4AA3, 0x2124, 0x8134,
+ 0xCF18, 0x6F08, 0x048F, 0xA49F, 0xD381, 0x7391, 0x1816, 0xB806,
+ 0x6086, 0xC096, 0xAB11, 0x0B01, 0x7C1F, 0xDC0F, 0xB788, 0x1798,
+ 0x59B4, 0xF9A4, 0x9223, 0x3233, 0x452D, 0xE53D, 0x8EBA, 0x2EAA,
+ 0x12E2, 0xB2F2, 0xD975, 0x7965, 0x0E7B, 0xAE6B, 0xC5EC, 0x65FC,
+ 0x2BD0, 0x8BC0, 0xE047, 0x4057, 0x3749, 0x9759, 0xFCDE, 0x5CCE,
+ 0xC669, 0x6679, 0x0DFE, 0xADEE, 0xDAF0, 0x7AE0, 0x1167, 0xB177,
+ 0xFF5B, 0x5F4B, 0x34CC, 0x94DC, 0xE3C2, 0x43D2, 0x2855, 0x8845,
+ 0xB40D, 0x141D, 0x7F9A, 0xDF8A, 0xA894, 0x0884, 0x6303, 0xC313,
+ 0x8D3F, 0x2D2F, 0x46A8, 0xE6B8, 0x91A6, 0x31B6, 0x5A31, 0xFA21,
+ 0x22A1, 0x82B1, 0xE936, 0x4926, 0x3E38, 0x9E28, 0xF5AF, 0x55BF,
+ 0x1B93, 0xBB83, 0xD004, 0x7014, 0x070A, 0xA71A, 0xCC9D, 0x6C8D,
+ 0x50C5, 0xF0D5, 0x9B52, 0x3B42, 0x4C5C, 0xEC4C, 0x87CB, 0x27DB,
+ 0x69F7, 0xC9E7, 0xA260, 0x0270, 0x756E, 0xD57E, 0xBEF9, 0x1EE9
+ },
+ {
+ 0x0000, 0x832B, 0x8DE1, 0x0ECA, 0x9075, 0x135E, 0x1D94, 0x9EBF,
+ 0xAB5D, 0x2876, 0x26BC, 0xA597, 0x3B28, 0xB803, 0xB6C9, 0x35E2,
+ 0xDD0D, 0x5E26, 0x50EC, 0xD3C7, 0x4D78, 0xCE53, 0xC099, 0x43B2,
+ 0x7650, 0xF57B, 0xFBB1, 0x789A, 0xE625, 0x650E, 0x6BC4, 0xE8EF,
+ 0x31AD, 0xB286, 0xBC4C, 0x3F67, 0xA1D8, 0x22F3, 0x2C39, 0xAF12,
+ 0x9AF0, 0x19DB, 0x1711, 0x943A, 0x0A85, 0x89AE, 0x8764, 0x044F,
+ 0xECA0, 0x6F8B, 0x6141, 0xE26A, 0x7CD5, 0xFFFE, 0xF134, 0x721F,
+ 0x47FD, 0xC4D6, 0xCA1C, 0x4937, 0xD788, 0x54A3, 0x5A69, 0xD942,
+ 0x635A, 0xE071, 0xEEBB, 0x6D90, 0xF32F, 0x7004, 0x7ECE, 0xFDE5,
+ 0xC807, 0x4B2C, 0x45E6, 0xC6CD, 0x5872, 0xDB59, 0xD593, 0x56B8,
+ 0xBE57, 0x3D7C, 0x33B6, 0xB09D, 0x2E22, 0xAD09, 0xA3C3, 0x20E8,
+ 0x150A, 0x9621, 0x98EB, 0x1BC0, 0x857F, 0x0654, 0x089E, 0x8BB5,
+ 0x52F7, 0xD1DC, 0xDF16, 0x5C3D, 0xC282, 0x41A9, 0x4F63, 0xCC48,
+ 0xF9AA, 0x7A81, 0x744B, 0xF760, 0x69DF, 0xEAF4, 0xE43E, 0x6715,
+ 0x8FFA, 0x0CD1, 0x021B, 0x8130, 0x1F8F, 0x9CA4, 0x926E, 0x1145,
+ 0x24A7, 0xA78C, 0xA946, 0x2A6D, 0xB4D2, 0x37F9, 0x3933, 0xBA18,
+ 0xC6B4, 0x459F, 0x4B55, 0xC87E, 0x56C1, 0xD5EA, 0xDB20, 0x580B,
+ 0x6DE9, 0xEEC2, 0xE008, 0x6323, 0xFD9C, 0x7EB7, 0x707D, 0xF356,
+ 0x1BB9, 0x9892, 0x9658, 0x1573, 0x8BCC, 0x08E7, 0x062D, 0x8506,
+ 0xB0E4, 0x33CF, 0x3D05, 0xBE2E, 0x2091, 0xA3BA, 0xAD70, 0x2E5B,
+ 0xF719, 0x7432, 0x7AF8, 0xF9D3, 0x676C, 0xE447, 0xEA8D, 0x69A6,
+ 0x5C44, 0xDF6F, 0xD1A5, 0x528E, 0xCC31, 0x4F1A, 0x41D0, 0xC2FB,
+ 0x2A14, 0xA93F, 0xA7F5, 0x24DE, 0xBA61, 0x394A, 0x3780, 0xB4AB,
+ 0x8149, 0x0262, 0x0CA8, 0x8F83, 0x113C, 0x9217, 0x9CDD, 0x1FF6,
+ 0xA5EE, 0x26C5, 0x280F, 0xAB24, 0x359B, 0xB6B0, 0xB87A, 0x3B51,
+ 0x0EB3, 0x8D98, 0x8352, 0x0079, 0x9EC6, 0x1DED, 0x1327, 0x900C,
+ 0x78E3, 0xFBC8, 0xF502, 0x7629, 0xE896, 0x6BBD, 0x6577, 0xE65C,
+ 0xD3BE, 0x5095, 0x5E5F, 0xDD74, 0x43CB, 0xC0E0, 0xCE2A, 0x4D01,
+ 0x9443, 0x1768, 0x19A2, 0x9A89, 0x0436, 0x871D, 0x89D7, 0x0AFC,
+ 0x3F1E, 0xBC35, 0xB2FF, 0x31D4, 0xAF6B, 0x2C40, 0x228A, 0xA1A1,
+ 0x494E, 0xCA65, 0xC4AF, 0x4784, 0xD93B, 0x5A10, 0x54DA, 0xD7F1,
+ 0xE213, 0x6138, 0x6FF2, 0xECD9, 0x7266, 0xF14D, 0xFF87, 0x7CAC
+ },
+#endif
};

__u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len)
{
- unsigned int i;
+ const u8 *ptr = (const __u8 *)buffer;
+ const u8 *ptr_end = ptr + len;
+#if CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE > 1
+ size_t tablesize = 1 << (CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE - 1);
+ const u8 *ptr_last = ptr + (len / tablesize * tablesize);

- for (i = 0 ; i < len ; i++)
- crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
+ while (ptr < ptr_last) {
+ size_t index = tablesize;
+ __u16 t;
+
+ t = t10dif_crc_table[--index][*ptr++ ^ (u8)(crc >> 8)];
+ t ^= t10dif_crc_table[--index][*ptr++ ^ (u8)crc];
+ crc = t;
+ while (index > 0)
+ crc ^= t10dif_crc_table[--index][*ptr++];
+ }
+#endif
+ while (ptr < ptr_end)
+ crc = t10dif_crc_table[0][*ptr++ ^ (u8)(crc >> 8)] ^ (crc << 8);

return crc;
}

2018-08-11 00:34:25

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Fri, 10 Aug 2018, Joe Perches wrote:

> On Fri, 2018-08-10 at 16:02 -0400, Nicolas Pitre wrote:
> > On Fri, 10 Aug 2018, Joe Perches wrote:
> >
> > > On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote:
> > > > This patch provides a performance improvement for the CRC16 calculations done in read/write
> > > > workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
> > > > workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
> > > > bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
> > > > folks from utilizing the throughput of such devices. To speed up this calculation and expose
> > > > the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
> > > > with a larger CRC table to match. The result has shown 5x performance improvements on various
> > > > big endian and little endian systems running the 4.18.0 kernel version.
> > >
> > > Thanks.
> > >
> > > This seems a sensible tradeoff for the 4k text size increase.
> >
> > More like 7.5KB. Would be best if this was configurable so the small
> > version remained available.
>
> Maybe something like: (compiled, untested)
> ---
> crypto/Kconfig | 10 +
> crypto/crct10dif_common.c | 543 +++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 549 insertions(+), 4 deletions(-)
>
> diff --git a/crypto/Kconfig b/crypto/Kconfig
> index f3e40ac56d93..88d9d17bb18a 100644
> --- a/crypto/Kconfig
> +++ b/crypto/Kconfig
> @@ -618,6 +618,16 @@ config CRYPTO_CRCT10DIF
> a crypto transform. This allows for faster crc t10 diff
> transforms to be used if they are available.
>
> +config CRYPTO_CRCT10DIF_TABLE_SIZE
> + int "Size of CRCT10DIF crc tables (as a power of 2)"
> + depends on CRYPTO_CRCT10DIF
> + range 1 5
> + default 1 if EMBEDDED
> + default 5

You could even make the prompt depend on EXPERT.

I like it!

Acked-by: Nicolas Pitre <[email protected]>


Nicolas

2018-08-11 02:39:19

by Douglas Gilbert

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On 2018-08-10 08:11 PM, Joe Perches wrote:
> On Fri, 2018-08-10 at 16:02 -0400, Nicolas Pitre wrote:
>> On Fri, 10 Aug 2018, Joe Perches wrote:
>>
>>> On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote:
>>>> This patch provides a performance improvement for the CRC16 calculations done in read/write
>>>> workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
>>>> workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
>>>> bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
>>>> folks from utilizing the throughput of such devices. To speed up this calculation and expose
>>>> the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
>>>> with a larger CRC table to match. The result has shown 5x performance improvements on various
>>>> big endian and little endian systems running the 4.18.0 kernel version.
>>>
>>> Thanks.
>>>
>>> This seems a sensible tradeoff for the 4k text size increase.
>>
>> More like 7.5KB. Would be best if this was configurable so the small
>> version remained available.
>
> Maybe something like: (compiled, untested)
> ---
> crypto/Kconfig | 10 +
> crypto/crct10dif_common.c | 543 +++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 549 insertions(+), 4 deletions(-)
>
> diff --git a/crypto/Kconfig b/crypto/Kconfig
> index f3e40ac56d93..88d9d17bb18a 100644
> --- a/crypto/Kconfig
> +++ b/crypto/Kconfig
> @@ -618,6 +618,16 @@ config CRYPTO_CRCT10DIF
> a crypto transform. This allows for faster crc t10 diff
> transforms to be used if they are available.
>
> +config CRYPTO_CRCT10DIF_TABLE_SIZE
> + int "Size of CRCT10DIF crc tables (as a power of 2)"
> + depends on CRYPTO_CRCT10DIF
> + range 1 5
> + default 1 if EMBEDDED
> + default 5
> + help
> + Set the table size used by the CRYPTO_CRCT10DIF crc calculation
> + Larger values use more memory and are faster.
> +
> config CRYPTO_CRCT10DIF_PCLMUL
> tristate "CRCT10DIF PCLMULQDQ hardware acceleration"
> depends on X86 && 64BIT && CRC_T10DIF
> diff --git a/crypto/crct10dif_common.c b/crypto/crct10dif_common.c
> index b2fab366f518..4eb1c50c3688 100644
> --- a/crypto/crct10dif_common.c
> +++ b/crypto/crct10dif_common.c
> @@ -32,7 +32,8 @@
> * x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
> * gt: 0x8bb7
> */
> -static const __u16 t10_dif_crc_table[256] = {
> +static const __u16 t10dif_crc_table[][256] = {
> + {

<snip data table>

> + },
> +#endif
> };
>
> __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len)
> {
> - unsigned int i;
> + const u8 *ptr = (const __u8 *)buffer;
> + const u8 *ptr_end = ptr + len;
> +#if CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE > 1
> + size_t tablesize = 1 << (CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE - 1);
> + const u8 *ptr_last = ptr + (len / tablesize * tablesize);
>
> - for (i = 0 ; i < len ; i++)
> - crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
> + while (ptr < ptr_last) {
> + size_t index = tablesize;
> + __u16 t;
> +
> + t = t10dif_crc_table[--index][*ptr++ ^ (u8)(crc >> 8)];
> + t ^= t10dif_crc_table[--index][*ptr++ ^ (u8)crc];
> + crc = t;
> + while (index > 0)
> + crc ^= t10dif_crc_table[--index][*ptr++];
> + }
> +#endif
> + while (ptr < ptr_end)
> + crc = t10dif_crc_table[0][*ptr++ ^ (u8)(crc >> 8)] ^ (crc << 8);
>
> return crc;
> }
>
>

It is a bit messy but below is a copy and paste of a table 27 from draft SBC-4
revision 15 in chapter 4.22.4.4 on page 87.

Table 27 — CRC test cases
Pattern
32 bytes each set to 00h CRC=0000h
32 bytes each set to FFh CRC=A293h
32 bytes of an incrementing pattern from 00h to 1Fh CRC=0224h
2 bytes each set to FFh followed by 30 bytes set to 00h CRC=21B8h
32 bytes of a decrementing pattern from FFh to E0h CRC=A0B7h

There is also example C code for its calculation in Annex C on pages
375 and 376.

Doug Gilbert

2018-08-11 09:04:06

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Fri, 2018-08-10 at 22:39 -0400, Douglas Gilbert wrote:
> but below is a copy and paste of a table 27 from draft SBC-4
> revision 15 in chapter 4.22.4.4 on page 87.

The posted code returns the proper crc for each
CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE value from
1 to 5 for these arrays.

2018-08-11 15:06:10

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Sat, 2018-08-11 at 02:04 -0700, Joe Perches wrote:
> On Fri, 2018-08-10 at 22:39 -0400, Douglas Gilbert wrote:
> > but below is a copy and paste of a table 27 from draft SBC-4
> > revision 15 in chapter 4.22.4.4 on page 87.
>
> The posted code returns the proper crc for each
> CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE value from
> 1 to 5 for these arrays.

Jeff, could you please test the suggested patch
with your comparison framework again with each
CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE from 1 to 5?

I get on a very limited test framework here:
(runtime average of 10 runs)

1: 4.32
2: 1.86
3: 1.31
4: 1.05
5: 0.99

2018-08-11 15:36:20

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.


Jeff,

> This patch provides a performance improvement for the CRC16
> calculations done in read/write workloads using the T10 Type 1/2/3
> guard field. For example, today with sequential write workloads (one
> thread/CPU of IO) we consume 100% of the CPU because of the CRC16
> computation bottleneck. Today's block devices are considerably
> faster, but the CRC16 calculation prevents folks from utilizing the
> throughput of such devices. To speed up this calculation and expose
> the block device throughput, we slice the old single byte for loop
> into a 16 byte for loop, with a larger CRC table to match. The result
> has shown 5x performance improvements on various big endian and little
> endian systems running the 4.18.0 kernel version.

The reason I went with a simple slice-by-one approach was that the
larger tables had a negative impact on the CPU caches. So while
slice-by-N numbers looked better in synthetic benchmarks, actual
application performance started getting affected as the tables grew
larger.

These days we obviously use the hardware-accelerated CRC calculation so
the software table approach mostly serves as a reference
implementation. But given your big vs. little endian performance
metrics, I'm assuming you guys are focused on embedded processors
without support for CRC acceleration?

I have no problem providing a choice for bigger tables. My only concern
is that the selection heuristics need to be more than one-dimensional.
Latency and cache side effects are often more important than throughput.
At least on the initiator side.

Also, I'd like to keep the original slice-by-one implementation for
reference purposes.

--
Martin K. Petersen Oracle Linux Engineering

2018-08-11 16:35:42

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Sat, 2018-08-11 at 11:36 -0400, Martin K. Petersen wrote:
> Jeff,
>
> > This patch provides a performance improvement for the CRC16
> > calculations done in read/write workloads using the T10 Type 1/2/3
> > guard field. For example, today with sequential write workloads (one
> > thread/CPU of IO) we consume 100% of the CPU because of the CRC16
> > computation bottleneck. Today's block devices are considerably
> > faster, but the CRC16 calculation prevents folks from utilizing the
> > throughput of such devices. To speed up this calculation and expose
> > the block device throughput, we slice the old single byte for loop
> > into a 16 byte for loop, with a larger CRC table to match. The result
> > has shown 5x performance improvements on various big endian and little
> > endian systems running the 4.18.0 kernel version.
>
> The reason I went with a simple slice-by-one approach was that the
> larger tables had a negative impact on the CPU caches. So while
> slice-by-N numbers looked better in synthetic benchmarks, actual
> application performance started getting affected as the tables grew
> larger.
>
> These days we obviously use the hardware-accelerated CRC calculation so
> the software table approach mostly serves as a reference
> implementation. But given your big vs. little endian performance
> metrics, I'm assuming you guys are focused on embedded processors
> without support for CRC acceleration?
>
> I have no problem providing a choice for bigger tables. My only concern
> is that the selection heuristics need to be more than one-dimensional.
> Latency and cache side effects are often more important than throughput.
> At least on the initiator side.
>
> Also, I'd like to keep the original slice-by-one implementation for
> reference purposes.

Did you see the suggested patch that allows
either 1, 2, 4, 8 or 16 block table sizes?

Perhaps you have a comment on that?

2018-08-13 03:36:17

by Douglas Gilbert

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On 2018-08-10 08:11 PM, Joe Perches wrote:
> On Fri, 2018-08-10 at 16:02 -0400, Nicolas Pitre wrote:
>> On Fri, 10 Aug 2018, Joe Perches wrote:
>>
>>> On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote:
>>>> This patch provides a performance improvement for the CRC16 calculations done in read/write
>>>> workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
>>>> workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
>>>> bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
>>>> folks from utilizing the throughput of such devices. To speed up this calculation and expose
>>>> the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
>>>> with a larger CRC table to match. The result has shown 5x performance improvements on various
>>>> big endian and little endian systems running the 4.18.0 kernel version.
>>>
>>> Thanks.
>>>
>>> This seems a sensible tradeoff for the 4k text size increase.
>>
>> More like 7.5KB. Would be best if this was configurable so the small
>> version remained available.
>
> Maybe something like: (compiled, untested)
> ---
> crypto/Kconfig | 10 +
> crypto/crct10dif_common.c | 543 +++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 549 insertions(+), 4 deletions(-)
>
> diff --git a/crypto/Kconfig b/crypto/Kconfig
> index f3e40ac56d93..88d9d17bb18a 100644
> --- a/crypto/Kconfig
> +++ b/crypto/Kconfig
> @@ -618,6 +618,16 @@ config CRYPTO_CRCT10DIF
> a crypto transform. This allows for faster crc t10 diff
> transforms to be used if they are available.
>
> +config CRYPTO_CRCT10DIF_TABLE_SIZE
> + int "Size of CRCT10DIF crc tables (as a power of 2)"
> + depends on CRYPTO_CRCT10DIF
> + range 1 5
> + default 1 if EMBEDDED
> + default 5
> + help
> + Set the table size used by the CRYPTO_CRCT10DIF crc calculation
> + Larger values use more memory and are faster.
> +
> config CRYPTO_CRCT10DIF_PCLMUL
> tristate "CRCT10DIF PCLMULQDQ hardware acceleration"
> depends on X86 && 64BIT && CRC_T10DIF
> diff --git a/crypto/crct10dif_common.c b/crypto/crct10dif_common.c
> index b2fab366f518..4eb1c50c3688 100644
> --- a/crypto/crct10dif_common.c
> +++ b/crypto/crct10dif_common.c
> @@ -32,7 +32,8 @@
> * x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
> * gt: 0x8bb7
> */
> -static const __u16 t10_dif_crc_table[256] = {
> +static const __u16 t10dif_crc_table[][256] = {

<snip table>

> };
>
> __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len)
> {
> - unsigned int i;
> + const u8 *ptr = (const __u8 *)buffer;
> + const u8 *ptr_end = ptr + len;
> +#if CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE > 1
> + size_t tablesize = 1 << (CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE - 1);
> + const u8 *ptr_last = ptr + (len / tablesize * tablesize);
>
> - for (i = 0 ; i < len ; i++)
> - crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
> + while (ptr < ptr_last) {
> + size_t index = tablesize;
> + __u16 t;
> +
> + t = t10dif_crc_table[--index][*ptr++ ^ (u8)(crc >> 8)];
> + t ^= t10dif_crc_table[--index][*ptr++ ^ (u8)crc];
> + crc = t;
> + while (index > 0)
> + crc ^= t10dif_crc_table[--index][*ptr++];
> + }
> +#endif
> + while (ptr < ptr_end)
> + crc = t10dif_crc_table[0][*ptr++ ^ (u8)(crc >> 8)] ^ (crc << 8);
>
> return crc;
> }
>
>

The attached patch is on top of the one above. I tested it in the user space
where it is around 20% faster (with a full size table). Also tried swab16 but
there was no gain from that (perhaps around a 2% loss).

Doug Gilbert


Attachments:
crc_t10dif_on_jp.patch (1.42 kB)

2018-08-13 04:29:58

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Sun, 2018-08-12 at 23:36 -0400, Douglas Gilbert wrote:
> On 2018-08-10 08:11 PM, Joe Perches wrote:
> > On Fri, 2018-08-10 at 16:02 -0400, Nicolas Pitre wrote:
> > > On Fri, 10 Aug 2018, Joe Perches wrote:
> > >
> > > > On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote:
> > > > > This patch provides a performance improvement for the CRC16 calculations done in read/write
> > > > > workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
> > > > > workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
> > > > > bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
> > > > > folks from utilizing the throughput of such devices. To speed up this calculation and expose
> > > > > the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
> > > > > with a larger CRC table to match. The result has shown 5x performance improvements on various
> > > > > big endian and little endian systems running the 4.18.0 kernel version.
> > > >
> > > > Thanks.
> > > >
> > > > This seems a sensible tradeoff for the 4k text size increase.
> > >
> > > More like 7.5KB. Would be best if this was configurable so the small
> > > version remained available.
> >
> > Maybe something like: (compiled, untested)
> > ---
> > crypto/Kconfig | 10 +
> > crypto/crct10dif_common.c | 543 +++++++++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 549 insertions(+), 4 deletions(-)
> >
> > diff --git a/crypto/Kconfig b/crypto/Kconfig
> > index f3e40ac56d93..88d9d17bb18a 100644
> > --- a/crypto/Kconfig
> > +++ b/crypto/Kconfig
> > @@ -618,6 +618,16 @@ config CRYPTO_CRCT10DIF
> > a crypto transform. This allows for faster crc t10 diff
> > transforms to be used if they are available.
> >
> > +config CRYPTO_CRCT10DIF_TABLE_SIZE
> > + int "Size of CRCT10DIF crc tables (as a power of 2)"
> > + depends on CRYPTO_CRCT10DIF
> > + range 1 5
> > + default 1 if EMBEDDED
> > + default 5
> > + help
> > + Set the table size used by the CRYPTO_CRCT10DIF crc calculation
> > + Larger values use more memory and are faster.
> > +
> > config CRYPTO_CRCT10DIF_PCLMUL
> > tristate "CRCT10DIF PCLMULQDQ hardware acceleration"
> > depends on X86 && 64BIT && CRC_T10DIF
> > diff --git a/crypto/crct10dif_common.c b/crypto/crct10dif_common.c
> > index b2fab366f518..4eb1c50c3688 100644
> > --- a/crypto/crct10dif_common.c
> > +++ b/crypto/crct10dif_common.c
> > @@ -32,7 +32,8 @@
> > * x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
> > * gt: 0x8bb7
> > */
> > -static const __u16 t10_dif_crc_table[256] = {
> > +static const __u16 t10dif_crc_table[][256] = {
>
> <snip table>
>
> > };
> >
> > __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len)
> > {
> > - unsigned int i;
> > + const u8 *ptr = (const __u8 *)buffer;
> > + const u8 *ptr_end = ptr + len;
> > +#if CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE > 1
> > + size_t tablesize = 1 << (CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE - 1);
> > + const u8 *ptr_last = ptr + (len / tablesize * tablesize);
> >
> > - for (i = 0 ; i < len ; i++)
> > - crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
> > + while (ptr < ptr_last) {
> > + size_t index = tablesize;
> > + __u16 t;
> > +
> > + t = t10dif_crc_table[--index][*ptr++ ^ (u8)(crc >> 8)];
> > + t ^= t10dif_crc_table[--index][*ptr++ ^ (u8)crc];
> > + crc = t;
> > + while (index > 0)
> > + crc ^= t10dif_crc_table[--index][*ptr++];
> > + }
> > +#endif
> > + while (ptr < ptr_end)
> > + crc = t10dif_crc_table[0][*ptr++ ^ (u8)(crc >> 8)] ^ (crc << 8);
> >
> > return crc;
> > }
> >
> >
>
> The attached patch is on top of the one above. I tested it in the user space
> where it is around 20% faster (with a full size table). Also tried swab16 but
> there was no gain from that (perhaps around a 2% loss).

I don't get a significant difference in performance.
gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)

2018-08-13 04:44:11

by Chaitanya Kulkarni

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.



On 8/10/18, 12:13 PM, "[email protected] on behalf of Jeff Lien" <[email protected] on behalf of [email protected]> wrote:

This patch provides a performance improvement for the CRC16 calculations done in read/write
workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
folks from utilizing the throughput of such devices. To speed up this calculation and expose
the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
with a larger CRC table to match. The result has shown 5x performance improvements on various
big endian and little endian systems running the 4.18.0 kernel version.

FIO Sequential Write, 64K Block Size, Queue Depth 64
BE Base Kernel: bw=201.5 MiB/s
BE Modified CRC Calc: bw=968.1 MiB/s
4.80x performance improvement

LE Base Kernel: bw=357 MiB/s
LE Modified CRC Calc: bw=1964 MiB/s
5.51x performance improvement

FIO Sequential Read, 64K Block Size, Queue Depth 64
BE Base Kernel: bw=611.2 MiB/s
BE Modified CRC calc: bw=684.9 MiB/s
1.12x performance improvement

LE Base Kernel: bw=797 MiB/s
LE Modified CRC Calc: bw=2730 MiB/s
3.42x performance improvement

Thanks for doing this, can you please share original fio config file?


Reviewed-by: Dave Darrington <[email protected]>
Reviewed-by: Jeff Furlong <[email protected]>
Signed-off-by: Jeff Lien <[email protected]>
---
crypto/crct10dif_common.c | 605 +++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 569 insertions(+), 36 deletions(-)

diff --git a/crypto/crct10dif_common.c b/crypto/crct10dif_common.c
index b2fab36..40e1d6c 100644
--- a/crypto/crct10dif_common.c
+++ b/crypto/crct10dif_common.c
@@ -32,47 +32,580 @@
* x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
* gt: 0x8bb7
*/
-static const __u16 t10_dif_crc_table[256] = {
- 0x0000, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
- 0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
- 0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
- 0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
- 0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
- 0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
- 0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
- 0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
- 0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
- 0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
- 0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
- 0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
- 0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
- 0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
- 0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
- 0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
- 0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
- 0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
- 0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
- 0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
- 0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
- 0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
- 0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
- 0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
- 0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
- 0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
- 0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
- 0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
- 0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
- 0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
- 0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
- 0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3

Please add or update the comment here about how this table is generated.

+static const __u16 t10_dif_crc_table[16][256] = {
+ {
+ 0x0000u, 0x8BB7u, 0x9CD9u, 0x176Eu, 0xB205u, 0x39B2u, 0x2EDCu, 0xA56Bu,
+ 0xEFBDu, 0x640Au, 0x7364u, 0xF8D3u, 0x5DB8u, 0xD60Fu, 0xC161u, 0x4AD6u,
+ 0x54CDu, 0xDF7Au, 0xC814u, 0x43A3u, 0xE6C8u, 0x6D7Fu, 0x7A11u, 0xF1A6u,
+ 0xBB70u, 0x30C7u, 0x27A9u, 0xAC1Eu, 0x0975u, 0x82C2u, 0x95ACu, 0x1E1Bu,
+ 0xA99Au, 0x222Du, 0x3543u, 0xBEF4u, 0x1B9Fu, 0x9028u, 0x8746u, 0x0CF1u,
+ 0x4627u, 0xCD90u, 0xDAFEu, 0x5149u, 0xF422u, 0x7F95u, 0x68FBu, 0xE34Cu,
+ 0xFD57u, 0x76E0u, 0x618Eu, 0xEA39u, 0x4F52u, 0xC4E5u, 0xD38Bu, 0x583Cu,
+ 0x12EAu, 0x995Du, 0x8E33u, 0x0584u, 0xA0EFu, 0x2B58u, 0x3C36u, 0xB781u,
+ 0xD883u, 0x5334u, 0x445Au, 0xCFEDu, 0x6A86u, 0xE131u, 0xF65Fu, 0x7DE8u,
+ 0x373Eu, 0xBC89u, 0xABE7u, 0x2050u, 0x853Bu, 0x0E8Cu, 0x19E2u, 0x9255u,
+ 0x8C4Eu, 0x07F9u, 0x1097u, 0x9B20u, 0x3E4Bu, 0xB5FCu, 0xA292u, 0x2925u,
+ 0x63F3u, 0xE844u, 0xFF2Au, 0x749Du, 0xD1F6u, 0x5A41u, 0x4D2Fu, 0xC698u,
+ 0x7119u, 0xFAAEu, 0xEDC0u, 0x6677u, 0xC31Cu, 0x48ABu, 0x5FC5u, 0xD472u,
+ 0x9EA4u, 0x1513u, 0x027Du, 0x89CAu, 0x2CA1u, 0xA716u, 0xB078u, 0x3BCFu,
+ 0x25D4u, 0xAE63u, 0xB90Du, 0x32BAu, 0x97D1u, 0x1C66u, 0x0B08u, 0x80BFu,
+ 0xCA69u, 0x41DEu, 0x56B0u, 0xDD07u, 0x786Cu, 0xF3DBu, 0xE4B5u, 0x6F02u,
+ 0x3AB1u, 0xB106u, 0xA668u, 0x2DDFu, 0x88B4u, 0x0303u, 0x146Du, 0x9FDAu,
+ 0xD50Cu, 0x5EBBu, 0x49D5u, 0xC262u, 0x6709u, 0xECBEu, 0xFBD0u, 0x7067u,
+ 0x6E7Cu, 0xE5CBu, 0xF2A5u, 0x7912u, 0xDC79u, 0x57CEu, 0x40A0u, 0xCB17u,
+ 0x81C1u, 0x0A76u, 0x1D18u, 0x96AFu, 0x33C4u, 0xB873u, 0xAF1Du, 0x24AAu,
+ 0x932Bu, 0x189Cu, 0x0FF2u, 0x8445u, 0x212Eu, 0xAA99u, 0xBDF7u, 0x3640u,
+ 0x7C96u, 0xF721u, 0xE04Fu, 0x6BF8u, 0xCE93u, 0x4524u, 0x524Au, 0xD9FDu,
+ 0xC7E6u, 0x4C51u, 0x5B3Fu, 0xD088u, 0x75E3u, 0xFE54u, 0xE93Au, 0x628Du,
+ 0x285Bu, 0xA3ECu, 0xB482u, 0x3F35u, 0x9A5Eu, 0x11E9u, 0x0687u, 0x8D30u,
+ 0xE232u, 0x6985u, 0x7EEBu, 0xF55Cu, 0x5037u, 0xDB80u, 0xCCEEu, 0x4759u,
+ 0x0D8Fu, 0x8638u, 0x9156u, 0x1AE1u, 0xBF8Au, 0x343Du, 0x2353u, 0xA8E4u,
+ 0xB6FFu, 0x3D48u, 0x2A26u, 0xA191u, 0x04FAu, 0x8F4Du, 0x9823u, 0x1394u,
+ 0x5942u, 0xD2F5u, 0xC59Bu, 0x4E2Cu, 0xEB47u, 0x60F0u, 0x779Eu, 0xFC29u,
+ 0x4BA8u, 0xC01Fu, 0xD771u, 0x5CC6u, 0xF9ADu, 0x721Au, 0x6574u, 0xEEC3u,
+ 0xA415u, 0x2FA2u, 0x38CCu, 0xB37Bu, 0x1610u, 0x9DA7u, 0x8AC9u, 0x017Eu,
+ 0x1F65u, 0x94D2u, 0x83BCu, 0x080Bu, 0xAD60u, 0x26D7u, 0x31B9u, 0xBA0Eu,
+ 0xF0D8u, 0x7B6Fu, 0x6C01u, 0xE7B6u, 0x42DDu, 0xC96Au, 0xDE04u, 0x55B3u
+ },
+ {
+ 0x0000u, 0x7562u, 0xEAC4u, 0x9FA6u, 0x5E3Fu, 0x2B5Du, 0xB4FBu, 0xC199u,
+ 0xBC7Eu, 0xC91Cu, 0x56BAu, 0x23D8u, 0xE241u, 0x9723u, 0x0885u, 0x7DE7u,
+ 0xF34Bu, 0x8629u, 0x198Fu, 0x6CEDu, 0xAD74u, 0xD816u, 0x47B0u, 0x32D2u,
+ 0x4F35u, 0x3A57u, 0xA5F1u, 0xD093u, 0x110Au, 0x6468u, 0xFBCEu, 0x8EACu,
+ 0x6D21u, 0x1843u, 0x87E5u, 0xF287u, 0x331Eu, 0x467Cu, 0xD9DAu, 0xACB8u,
+ 0xD15Fu, 0xA43Du, 0x3B9Bu, 0x4EF9u, 0x8F60u, 0xFA02u, 0x65A4u, 0x10C6u,
+ 0x9E6Au, 0xEB08u, 0x74AEu, 0x01CCu, 0xC055u, 0xB537u, 0x2A91u, 0x5FF3u,
+ 0x2214u, 0x5776u, 0xC8D0u, 0xBDB2u, 0x7C2Bu, 0x0949u, 0x96EFu, 0xE38Du,
+ 0xDA42u, 0xAF20u, 0x3086u, 0x45E4u, 0x847Du, 0xF11Fu, 0x6EB9u, 0x1BDBu,
+ 0x663Cu, 0x135Eu, 0x8CF8u, 0xF99Au, 0x3803u, 0x4D61u, 0xD2C7u, 0xA7A5u,
+ 0x2909u, 0x5C6Bu, 0xC3CDu, 0xB6AFu, 0x7736u, 0x0254u, 0x9DF2u, 0xE890u,
+ 0x9577u, 0xE015u, 0x7FB3u, 0x0AD1u, 0xCB48u, 0xBE2Au, 0x218Cu, 0x54EEu,
+ 0xB763u, 0xC201u, 0x5DA7u, 0x28C5u, 0xE95Cu, 0x9C3Eu, 0x0398u, 0x76FAu,
+ 0x0B1Du, 0x7E7Fu, 0xE1D9u, 0x94BBu, 0x5522u, 0x2040u, 0xBFE6u, 0xCA84u,
+ 0x4428u, 0x314Au, 0xAEECu, 0xDB8Eu, 0x1A17u, 0x6F75u, 0xF0D3u, 0x85B1u,
+ 0xF856u, 0x8D34u, 0x1292u, 0x67F0u, 0xA669u, 0xD30Bu, 0x4CADu, 0x39CFu,
+ 0x3F33u, 0x4A51u, 0xD5F7u, 0xA095u, 0x610Cu, 0x146Eu, 0x8BC8u, 0xFEAAu,
+ 0x834Du, 0xF62Fu, 0x6989u, 0x1CEBu, 0xDD72u, 0xA810u, 0x37B6u, 0x42D4u,
+ 0xCC78u, 0xB91Au, 0x26BCu, 0x53DEu, 0x9247u, 0xE725u, 0x7883u, 0x0DE1u,
+ 0x7006u, 0x0564u, 0x9AC2u, 0xEFA0u, 0x2E39u, 0x5B5Bu, 0xC4FDu, 0xB19Fu,
+ 0x5212u, 0x2770u, 0xB8D6u, 0xCDB4u, 0x0C2Du, 0x794Fu, 0xE6E9u, 0x938Bu,
+ 0xEE6Cu, 0x9B0Eu, 0x04A8u, 0x71CAu, 0xB053u, 0xC531u, 0x5A97u, 0x2FF5u,
+ 0xA159u, 0xD43Bu, 0x4B9Du, 0x3EFFu, 0xFF66u, 0x8A04u, 0x15A2u, 0x60C0u,
+ 0x1D27u, 0x6845u, 0xF7E3u, 0x8281u, 0x4318u, 0x367Au, 0xA9DCu, 0xDCBEu,
+ 0xE571u, 0x9013u, 0x0FB5u, 0x7AD7u, 0xBB4Eu, 0xCE2Cu, 0x518Au, 0x24E8u,
+ 0x590Fu, 0x2C6Du, 0xB3CBu, 0xC6A9u, 0x0730u, 0x7252u, 0xEDF4u, 0x9896u,
+ 0x163Au, 0x6358u, 0xFCFEu, 0x899Cu, 0x4805u, 0x3D67u, 0xA2C1u, 0xD7A3u,
+ 0xAA44u, 0xDF26u, 0x4080u, 0x35E2u, 0xF47Bu, 0x8119u, 0x1EBFu, 0x6BDDu,
+ 0x8850u, 0xFD32u, 0x6294u, 0x17F6u, 0xD66Fu, 0xA30Du, 0x3CABu, 0x49C9u,
+ 0x342Eu, 0x414Cu, 0xDEEAu, 0xAB88u, 0x6A11u, 0x1F73u, 0x80D5u, 0xF5B7u,
+ 0x7B1Bu, 0x0E79u, 0x91DFu, 0xE4BDu, 0x2524u, 0x5046u, 0xCFE0u, 0xBA82u,
+ 0xC765u, 0xB207u, 0x2DA1u, 0x58C3u, 0x995Au, 0xEC38u, 0x739Eu, 0x06FCu
+ },
+ {
+ 0x0000u, 0x7E66u, 0xFCCCu, 0x82AAu, 0x722Fu, 0x0C49u, 0x8EE3u, 0xF085u,
+ 0xE45Eu, 0x9A38u, 0x1892u, 0x66F4u, 0x9671u, 0xE817u, 0x6ABDu, 0x14DBu,
+ 0x430Bu, 0x3D6Du, 0xBFC7u, 0xC1A1u, 0x3124u, 0x4F42u, 0xCDE8u, 0xB38Eu,
+ 0xA755u, 0xD933u, 0x5B99u, 0x25FFu, 0xD57Au, 0xAB1Cu, 0x29B6u, 0x57D0u,
+ 0x8616u, 0xF870u, 0x7ADAu, 0x04BCu, 0xF439u, 0x8A5Fu, 0x08F5u, 0x7693u,
+ 0x6248u, 0x1C2Eu, 0x9E84u, 0xE0E2u, 0x1067u, 0x6E01u, 0xECABu, 0x92CDu,
+ 0xC51Du, 0xBB7Bu, 0x39D1u, 0x47B7u, 0xB732u, 0xC954u, 0x4BFEu, 0x3598u,
+ 0x2143u, 0x5F25u, 0xDD8Fu, 0xA3E9u, 0x536Cu, 0x2D0Au, 0xAFA0u, 0xD1C6u,
+ 0x879Bu, 0xF9FDu, 0x7B57u, 0x0531u, 0xF5B4u, 0x8BD2u, 0x0978u, 0x771Eu,
+ 0x63C5u, 0x1DA3u, 0x9F09u, 0xE16Fu, 0x11EAu, 0x6F8Cu, 0xED26u, 0x9340u,
+ 0xC490u, 0xBAF6u, 0x385Cu, 0x463Au, 0xB6BFu, 0xC8D9u, 0x4A73u, 0x3415u,
+ 0x20CEu, 0x5EA8u, 0xDC02u, 0xA264u, 0x52E1u, 0x2C87u, 0xAE2Du, 0xD04Bu,
+ 0x018Du, 0x7FEBu, 0xFD41u, 0x8327u, 0x73A2u, 0x0DC4u, 0x8F6Eu, 0xF108u,
+ 0xE5D3u, 0x9BB5u, 0x191Fu, 0x6779u, 0x97FCu, 0xE99Au, 0x6B30u, 0x1556u,
+ 0x4286u, 0x3CE0u, 0xBE4Au, 0xC02Cu, 0x30A9u, 0x4ECFu, 0xCC65u, 0xB203u,
+ 0xA6D8u, 0xD8BEu, 0x5A14u, 0x2472u, 0xD4F7u, 0xAA91u, 0x283Bu, 0x565Du,
+ 0x8481u, 0xFAE7u, 0x784Du, 0x062Bu, 0xF6AEu, 0x88C8u, 0x0A62u, 0x7404u,
+ 0x60DFu, 0x1EB9u, 0x9C13u, 0xE275u, 0x12F0u, 0x6C96u, 0xEE3Cu, 0x905Au,
+ 0xC78Au, 0xB9ECu, 0x3B46u, 0x4520u, 0xB5A5u, 0xCBC3u, 0x4969u, 0x370Fu,
+ 0x23D4u, 0x5DB2u, 0xDF18u, 0xA17Eu, 0x51FBu, 0x2F9Du, 0xAD37u, 0xD351u,
+ 0x0297u, 0x7CF1u, 0xFE5Bu, 0x803Du, 0x70B8u, 0x0EDEu, 0x8C74u, 0xF212u,
+ 0xE6C9u, 0x98AFu, 0x1A05u, 0x6463u, 0x94E6u, 0xEA80u, 0x682Au, 0x164Cu,
+ 0x419Cu, 0x3FFAu, 0xBD50u, 0xC336u, 0x33B3u, 0x4DD5u, 0xCF7Fu, 0xB119u,
+ 0xA5C2u, 0xDBA4u, 0x590Eu, 0x2768u, 0xD7EDu, 0xA98Bu, 0x2B21u, 0x5547u,
+ 0x031Au, 0x7D7Cu, 0xFFD6u, 0x81B0u, 0x7135u, 0x0F53u, 0x8DF9u, 0xF39Fu,
+ 0xE744u, 0x9922u, 0x1B88u, 0x65EEu, 0x956Bu, 0xEB0Du, 0x69A7u, 0x17C1u,
+ 0x4011u, 0x3E77u, 0xBCDDu, 0xC2BBu, 0x323Eu, 0x4C58u, 0xCEF2u, 0xB094u,
+ 0xA44Fu, 0xDA29u, 0x5883u, 0x26E5u, 0xD660u, 0xA806u, 0x2AACu, 0x54CAu,
+ 0x850Cu, 0xFB6Au, 0x79C0u, 0x07A6u, 0xF723u, 0x8945u, 0x0BEFu, 0x7589u,
+ 0x6152u, 0x1F34u, 0x9D9Eu, 0xE3F8u, 0x137Du, 0x6D1Bu, 0xEFB1u, 0x91D7u,
+ 0xC607u, 0xB861u, 0x3ACBu, 0x44ADu, 0xB428u, 0xCA4Eu, 0x48E4u, 0x3682u,
+ 0x2259u, 0x5C3Fu, 0xDE95u, 0xA0F3u, 0x5076u, 0x2E10u, 0xACBAu, 0xD2DCu
+ },
+ {
+ 0x0000u, 0x82B5u, 0x8EDDu, 0x0C68u, 0x960Du, 0x14B8u, 0x18D0u, 0x9A65u,
+ 0xA7ADu, 0x2518u, 0x2970u, 0xABC5u, 0x31A0u, 0xB315u, 0xBF7Du, 0x3DC8u,
+ 0xC4EDu, 0x4658u, 0x4A30u, 0xC885u, 0x52E0u, 0xD055u, 0xDC3Du, 0x5E88u,
+ 0x6340u, 0xE1F5u, 0xED9Du, 0x6F28u, 0xF54Du, 0x77F8u, 0x7B90u, 0xF925u,
+ 0x026Du, 0x80D8u, 0x8CB0u, 0x0E05u, 0x9460u, 0x16D5u, 0x1ABDu, 0x9808u,
+ 0xA5C0u, 0x2775u, 0x2B1Du, 0xA9A8u, 0x33CDu, 0xB178u, 0xBD10u, 0x3FA5u,
+ 0xC680u, 0x4435u, 0x485Du, 0xCAE8u, 0x508Du, 0xD238u, 0xDE50u, 0x5CE5u,
+ 0x612Du, 0xE398u, 0xEFF0u, 0x6D45u, 0xF720u, 0x7595u, 0x79FDu, 0xFB48u,
+ 0x04DAu, 0x866Fu, 0x8A07u, 0x08B2u, 0x92D7u, 0x1062u, 0x1C0Au, 0x9EBFu,
+ 0xA377u, 0x21C2u, 0x2DAAu, 0xAF1Fu, 0x357Au, 0xB7CFu, 0xBBA7u, 0x3912u,
+ 0xC037u, 0x4282u, 0x4EEAu, 0xCC5Fu, 0x563Au, 0xD48Fu, 0xD8E7u, 0x5A52u,
+ 0x679Au, 0xE52Fu, 0xE947u, 0x6BF2u, 0xF197u, 0x7322u, 0x7F4Au, 0xFDFFu,
+ 0x06B7u, 0x8402u, 0x886Au, 0x0ADFu, 0x90BAu, 0x120Fu, 0x1E67u, 0x9CD2u,
+ 0xA11Au, 0x23AFu, 0x2FC7u, 0xAD72u, 0x3717u, 0xB5A2u, 0xB9CAu, 0x3B7Fu,
+ 0xC25Au, 0x40EFu, 0x4C87u, 0xCE32u, 0x5457u, 0xD6E2u, 0xDA8Au, 0x583Fu,
+ 0x65F7u, 0xE742u, 0xEB2Au, 0x699Fu, 0xF3FAu, 0x714Fu, 0x7D27u, 0xFF92u,
+ 0x09B4u, 0x8B01u, 0x8769u, 0x05DCu, 0x9FB9u, 0x1D0Cu, 0x1164u, 0x93D1u,
+ 0xAE19u, 0x2CACu, 0x20C4u, 0xA271u, 0x3814u, 0xBAA1u, 0xB6C9u, 0x347Cu,
+ 0xCD59u, 0x4FECu, 0x4384u, 0xC131u, 0x5B54u, 0xD9E1u, 0xD589u, 0x573Cu,
+ 0x6AF4u, 0xE841u, 0xE429u, 0x669Cu, 0xFCF9u, 0x7E4Cu, 0x7224u, 0xF091u,
+ 0x0BD9u, 0x896Cu, 0x8504u, 0x07B1u, 0x9DD4u, 0x1F61u, 0x1309u, 0x91BCu,
+ 0xAC74u, 0x2EC1u, 0x22A9u, 0xA01Cu, 0x3A79u, 0xB8CCu, 0xB4A4u, 0x3611u,
+ 0xCF34u, 0x4D81u, 0x41E9u, 0xC35Cu, 0x5939u, 0xDB8Cu, 0xD7E4u, 0x5551u,
+ 0x6899u, 0xEA2Cu, 0xE644u, 0x64F1u, 0xFE94u, 0x7C21u, 0x7049u, 0xF2FCu,
+ 0x0D6Eu, 0x8FDBu, 0x83B3u, 0x0106u, 0x9B63u, 0x19D6u, 0x15BEu, 0x970Bu,
+ 0xAAC3u, 0x2876u, 0x241Eu, 0xA6ABu, 0x3CCEu, 0xBE7Bu, 0xB213u, 0x30A6u,
+ 0xC983u, 0x4B36u, 0x475Eu, 0xC5EBu, 0x5F8Eu, 0xDD3Bu, 0xD153u, 0x53E6u,
+ 0x6E2Eu, 0xEC9Bu, 0xE0F3u, 0x6246u, 0xF823u, 0x7A96u, 0x76FEu, 0xF44Bu,
+ 0x0F03u, 0x8DB6u, 0x81DEu, 0x036Bu, 0x990Eu, 0x1BBBu, 0x17D3u, 0x9566u,
+ 0xA8AEu, 0x2A1Bu, 0x2673u, 0xA4C6u, 0x3EA3u, 0xBC16u, 0xB07Eu, 0x32CBu,
+ 0xCBEEu, 0x495Bu, 0x4533u, 0xC786u, 0x5DE3u, 0xDF56u, 0xD33Eu, 0x518Bu,
+ 0x6C43u, 0xEEF6u, 0xE29Eu, 0x602Bu, 0xFA4Eu, 0x78FBu, 0x7493u, 0xF626u
+ },
+ {
+ 0x0000u, 0x1368u, 0x26D0u, 0x35B8u, 0x4DA0u, 0x5EC8u, 0x6B70u, 0x7818u,
+ 0x9B40u, 0x8828u, 0xBD90u, 0xAEF8u, 0xD6E0u, 0xC588u, 0xF030u, 0xE358u,
+ 0xBD37u, 0xAE5Fu, 0x9BE7u, 0x888Fu, 0xF097u, 0xE3FFu, 0xD647u, 0xC52Fu,
+ 0x2677u, 0x351Fu, 0x00A7u, 0x13CFu, 0x6BD7u, 0x78BFu, 0x4D07u, 0x5E6Fu,
+ 0xF1D9u, 0xE2B1u, 0xD709u, 0xC461u, 0xBC79u, 0xAF11u, 0x9AA9u, 0x89C1u,
+ 0x6A99u, 0x79F1u, 0x4C49u, 0x5F21u, 0x2739u, 0x3451u, 0x01E9u, 0x1281u,
+ 0x4CEEu, 0x5F86u, 0x6A3Eu, 0x7956u, 0x014Eu, 0x1226u, 0x279Eu, 0x34F6u,
+ 0xD7AEu, 0xC4C6u, 0xF17Eu, 0xE216u, 0x9A0Eu, 0x8966u, 0xBCDEu, 0xAFB6u,
+ 0x6805u, 0x7B6Du, 0x4ED5u, 0x5DBDu, 0x25A5u, 0x36CDu, 0x0375u, 0x101Du,
+ 0xF345u, 0xE02Du, 0xD595u, 0xC6FDu, 0xBEE5u, 0xAD8Du, 0x9835u, 0x8B5Du,
+ 0xD532u, 0xC65Au, 0xF3E2u, 0xE08Au, 0x9892u, 0x8BFAu, 0xBE42u, 0xAD2Au,
+ 0x4E72u, 0x5D1Au, 0x68A2u, 0x7BCAu, 0x03D2u, 0x10BAu, 0x2502u, 0x366Au,
+ 0x99DCu, 0x8AB4u, 0xBF0Cu, 0xAC64u, 0xD47Cu, 0xC714u, 0xF2ACu, 0xE1C4u,
+ 0x029Cu, 0x11F4u, 0x244Cu, 0x3724u, 0x4F3Cu, 0x5C54u, 0x69ECu, 0x7A84u,
+ 0x24EBu, 0x3783u, 0x023Bu, 0x1153u, 0x694Bu, 0x7A23u, 0x4F9Bu, 0x5CF3u,
+ 0xBFABu, 0xACC3u, 0x997Bu, 0x8A13u, 0xF20Bu, 0xE163u, 0xD4DBu, 0xC7B3u,
+ 0xD00Au, 0xC362u, 0xF6DAu, 0xE5B2u, 0x9DAAu, 0x8EC2u, 0xBB7Au, 0xA812u,
+ 0x4B4Au, 0x5822u, 0x6D9Au, 0x7EF2u, 0x06EAu, 0x1582u, 0x203Au, 0x3352u,
+ 0x6D3Du, 0x7E55u, 0x4BEDu, 0x5885u, 0x209Du, 0x33F5u, 0x064Du, 0x1525u,
+ 0xF67Du, 0xE515u, 0xD0ADu, 0xC3C5u, 0xBBDDu, 0xA8B5u, 0x9D0Du, 0x8E65u,
+ 0x21D3u, 0x32BBu, 0x0703u, 0x146Bu, 0x6C73u, 0x7F1Bu, 0x4AA3u, 0x59CBu,
+ 0xBA93u, 0xA9FBu, 0x9C43u, 0x8F2Bu, 0xF733u, 0xE45Bu, 0xD1E3u, 0xC28Bu,
+ 0x9CE4u, 0x8F8Cu, 0xBA34u, 0xA95Cu, 0xD144u, 0xC22Cu, 0xF794u, 0xE4FCu,
+ 0x07A4u, 0x14CCu, 0x2174u, 0x321Cu, 0x4A04u, 0x596Cu, 0x6CD4u, 0x7FBCu,
+ 0xB80Fu, 0xAB67u, 0x9EDFu, 0x8DB7u, 0xF5AFu, 0xE6C7u, 0xD37Fu, 0xC017u,
+ 0x234Fu, 0x3027u, 0x059Fu, 0x16F7u, 0x6EEFu, 0x7D87u, 0x483Fu, 0x5B57u,
+ 0x0538u, 0x1650u, 0x23E8u, 0x3080u, 0x4898u, 0x5BF0u, 0x6E48u, 0x7D20u,
+ 0x9E78u, 0x8D10u, 0xB8A8u, 0xABC0u, 0xD3D8u, 0xC0B0u, 0xF508u, 0xE660u,
+ 0x49D6u, 0x5ABEu, 0x6F06u, 0x7C6Eu, 0x0476u, 0x171Eu, 0x22A6u, 0x31CEu,
+ 0xD296u, 0xC1FEu, 0xF446u, 0xE72Eu, 0x9F36u, 0x8C5Eu, 0xB9E6u, 0xAA8Eu,
+ 0xF4E1u, 0xE789u, 0xD231u, 0xC159u, 0xB941u, 0xAA29u, 0x9F91u, 0x8CF9u,
+ 0x6FA1u, 0x7CC9u, 0x4971u, 0x5A19u, 0x2201u, 0x3169u, 0x04D1u, 0x17B9u
+ },
+ {
+ 0x0000u, 0x2BA3u, 0x5746u, 0x7CE5u, 0xAE8Cu, 0x852Fu, 0xF9CAu, 0xD269u,
+ 0xD6AFu, 0xFD0Cu, 0x81E9u, 0xAA4Au, 0x7823u, 0x5380u, 0x2F65u, 0x04C6u,
+ 0x26E9u, 0x0D4Au, 0x71AFu, 0x5A0Cu, 0x8865u, 0xA3C6u, 0xDF23u, 0xF480u,
+ 0xF046u, 0xDBE5u, 0xA700u, 0x8CA3u, 0x5ECAu, 0x7569u, 0x098Cu, 0x222Fu,
+ 0x4DD2u, 0x6671u, 0x1A94u, 0x3137u, 0xE35Eu, 0xC8FDu, 0xB418u, 0x9FBBu,
+ 0x9B7Du, 0xB0DEu, 0xCC3Bu, 0xE798u, 0x35F1u, 0x1E52u, 0x62B7u, 0x4914u,
+ 0x6B3Bu, 0x4098u, 0x3C7Du, 0x17DEu, 0xC5B7u, 0xEE14u, 0x92F1u, 0xB952u,
+ 0xBD94u, 0x9637u, 0xEAD2u, 0xC171u, 0x1318u, 0x38BBu, 0x445Eu, 0x6FFDu,
+ 0x9BA4u, 0xB007u, 0xCCE2u, 0xE741u, 0x3528u, 0x1E8Bu, 0x626Eu, 0x49CDu,
+ 0x4D0Bu, 0x66A8u, 0x1A4Du, 0x31EEu, 0xE387u, 0xC824u, 0xB4C1u, 0x9F62u,
+ 0xBD4Du, 0x96EEu, 0xEA0Bu, 0xC1A8u, 0x13C1u, 0x3862u, 0x4487u, 0x6F24u,
+ 0x6BE2u, 0x4041u, 0x3CA4u, 0x1707u, 0xC56Eu, 0xEECDu, 0x9228u, 0xB98Bu,
+ 0xD676u, 0xFDD5u, 0x8130u, 0xAA93u, 0x78FAu, 0x5359u, 0x2FBCu, 0x041Fu,
+ 0x00D9u, 0x2B7Au, 0x579Fu, 0x7C3Cu, 0xAE55u, 0x85F6u, 0xF913u, 0xD2B0u,
+ 0xF09Fu, 0xDB3Cu, 0xA7D9u, 0x8C7Au, 0x5E13u, 0x75B0u, 0x0955u, 0x22F6u,
+ 0x2630u, 0x0D93u, 0x7176u, 0x5AD5u, 0x88BCu, 0xA31Fu, 0xDFFAu, 0xF459u,
+ 0xBCFFu, 0x975Cu, 0xEBB9u, 0xC01Au, 0x1273u, 0x39D0u, 0x4535u, 0x6E96u,
+ 0x6A50u, 0x41F3u, 0x3D16u, 0x16B5u, 0xC4DCu, 0xEF7Fu, 0x939Au, 0xB839u,
+ 0x9A16u, 0xB1B5u, 0xCD50u, 0xE6F3u, 0x349Au, 0x1F39u, 0x63DCu, 0x487Fu,
+ 0x4CB9u, 0x671Au, 0x1BFFu, 0x305Cu, 0xE235u, 0xC996u, 0xB573u, 0x9ED0u,
+ 0xF12Du, 0xDA8Eu, 0xA66Bu, 0x8DC8u, 0x5FA1u, 0x7402u, 0x08E7u, 0x2344u,
+ 0x2782u, 0x0C21u, 0x70C4u, 0x5B67u, 0x890Eu, 0xA2ADu, 0xDE48u, 0xF5EBu,
+ 0xD7C4u, 0xFC67u, 0x8082u, 0xAB21u, 0x7948u, 0x52EBu, 0x2E0Eu, 0x05ADu,
+ 0x016Bu, 0x2AC8u, 0x562Du, 0x7D8Eu, 0xAFE7u, 0x8444u, 0xF8A1u, 0xD302u,
+ 0x275Bu, 0x0CF8u, 0x701Du, 0x5BBEu, 0x89D7u, 0xA274u, 0xDE91u, 0xF532u,
+ 0xF1F4u, 0xDA57u, 0xA6B2u, 0x8D11u, 0x5F78u, 0x74DBu, 0x083Eu, 0x239Du,
+ 0x01B2u, 0x2A11u, 0x56F4u, 0x7D57u, 0xAF3Eu, 0x849Du, 0xF878u, 0xD3DBu,
+ 0xD71Du, 0xFCBEu, 0x805Bu, 0xABF8u, 0x7991u, 0x5232u, 0x2ED7u, 0x0574u,
+ 0x6A89u, 0x412Au, 0x3DCFu, 0x166Cu, 0xC405u, 0xEFA6u, 0x9343u, 0xB8E0u,
+ 0xBC26u, 0x9785u, 0xEB60u, 0xC0C3u, 0x12AAu, 0x3909u, 0x45ECu, 0x6E4Fu,
+ 0x4C60u, 0x67C3u, 0x1B26u, 0x3085u, 0xE2ECu, 0xC94Fu, 0xB5AAu, 0x9E09u,
+ 0x9ACFu, 0xB16Cu, 0xCD89u, 0xE62Au, 0x3443u, 0x1FE0u, 0x6305u, 0x48A6u
+ },
+ {
+ 0x0000u, 0xF249u, 0x6F25u, 0x9D6Cu, 0xDE4Au, 0x2C03u, 0xB16Fu, 0x4326u,
+ 0x3723u, 0xC56Au, 0x5806u, 0xAA4Fu, 0xE969u, 0x1B20u, 0x864Cu, 0x7405u,
+ 0x6E46u, 0x9C0Fu, 0x0163u, 0xF32Au, 0xB00Cu, 0x4245u, 0xDF29u, 0x2D60u,
+ 0x5965u, 0xAB2Cu, 0x3640u, 0xC409u, 0x872Fu, 0x7566u, 0xE80Au, 0x1A43u,
+ 0xDC8Cu, 0x2EC5u, 0xB3A9u, 0x41E0u, 0x02C6u, 0xF08Fu, 0x6DE3u, 0x9FAAu,
+ 0xEBAFu, 0x19E6u, 0x848Au, 0x76C3u, 0x35E5u, 0xC7ACu, 0x5AC0u, 0xA889u,
+ 0xB2CAu, 0x4083u, 0xDDEFu, 0x2FA6u, 0x6C80u, 0x9EC9u, 0x03A5u, 0xF1ECu,
+ 0x85E9u, 0x77A0u, 0xEACCu, 0x1885u, 0x5BA3u, 0xA9EAu, 0x3486u, 0xC6CFu,
+ 0x32AFu, 0xC0E6u, 0x5D8Au, 0xAFC3u, 0xECE5u, 0x1EACu, 0x83C0u, 0x7189u,
+ 0x058Cu, 0xF7C5u, 0x6AA9u, 0x98E0u, 0xDBC6u, 0x298Fu, 0xB4E3u, 0x46AAu,
+ 0x5CE9u, 0xAEA0u, 0x33CCu, 0xC185u, 0x82A3u, 0x70EAu, 0xED86u, 0x1FCFu,
+ 0x6BCAu, 0x9983u, 0x04EFu, 0xF6A6u, 0xB580u, 0x47C9u, 0xDAA5u, 0x28ECu,
+ 0xEE23u, 0x1C6Au, 0x8106u, 0x734Fu, 0x3069u, 0xC220u, 0x5F4Cu, 0xAD05u,
+ 0xD900u, 0x2B49u, 0xB625u, 0x446Cu, 0x074Au, 0xF503u, 0x686Fu, 0x9A26u,
+ 0x8065u, 0x722Cu, 0xEF40u, 0x1D09u, 0x5E2Fu, 0xAC66u, 0x310Au, 0xC343u,
+ 0xB746u, 0x450Fu, 0xD863u, 0x2A2Au, 0x690Cu, 0x9B45u, 0x0629u, 0xF460u,
+ 0x655Eu, 0x9717u, 0x0A7Bu, 0xF832u, 0xBB14u, 0x495Du, 0xD431u, 0x2678u,
+ 0x527Du, 0xA034u, 0x3D58u, 0xCF11u, 0x8C37u, 0x7E7Eu, 0xE312u, 0x115Bu,
+ 0x0B18u, 0xF951u, 0x643Du, 0x9674u, 0xD552u, 0x271Bu, 0xBA77u, 0x483Eu,
+ 0x3C3Bu, 0xCE72u, 0x531Eu, 0xA157u, 0xE271u, 0x1038u, 0x8D54u, 0x7F1Du,
+ 0xB9D2u, 0x4B9Bu, 0xD6F7u, 0x24BEu, 0x6798u, 0x95D1u, 0x08BDu, 0xFAF4u,
+ 0x8EF1u, 0x7CB8u, 0xE1D4u, 0x139Du, 0x50BBu, 0xA2F2u, 0x3F9Eu, 0xCDD7u,
+ 0xD794u, 0x25DDu, 0xB8B1u, 0x4AF8u, 0x09DEu, 0xFB97u, 0x66FBu, 0x94B2u,
+ 0xE0B7u, 0x12FEu, 0x8F92u, 0x7DDBu, 0x3EFDu, 0xCCB4u, 0x51D8u, 0xA391u,
+ 0x57F1u, 0xA5B8u, 0x38D4u, 0xCA9Du, 0x89BBu, 0x7BF2u, 0xE69Eu, 0x14D7u,
+ 0x60D2u, 0x929Bu, 0x0FF7u, 0xFDBEu, 0xBE98u, 0x4CD1u, 0xD1BDu, 0x23F4u,
+ 0x39B7u, 0xCBFEu, 0x5692u, 0xA4DBu, 0xE7FDu, 0x15B4u, 0x88D8u, 0x7A91u,
+ 0x0E94u, 0xFCDDu, 0x61B1u, 0x93F8u, 0xD0DEu, 0x2297u, 0xBFFBu, 0x4DB2u,
+ 0x8B7Du, 0x7934u, 0xE458u, 0x1611u, 0x5537u, 0xA77Eu, 0x3A12u, 0xC85Bu,
+ 0xBC5Eu, 0x4E17u, 0xD37Bu, 0x2132u, 0x6214u, 0x905Du, 0x0D31u, 0xFF78u,
+ 0xE53Bu, 0x1772u, 0x8A1Eu, 0x7857u, 0x3B71u, 0xC938u, 0x5454u, 0xA61Du,
+ 0xD218u, 0x2051u, 0xBD3Du, 0x4F74u, 0x0C52u, 0xFE1Bu, 0x6377u, 0x913Eu
+ },
+ {
+ 0x0000u, 0xCABCu, 0x1ECFu, 0xD473u, 0x3D9Eu, 0xF722u, 0x2351u, 0xE9EDu,
+ 0x7B3Cu, 0xB180u, 0x65F3u, 0xAF4Fu, 0x46A2u, 0x8C1Eu, 0x586Du, 0x92D1u,
+ 0xF678u, 0x3CC4u, 0xE8B7u, 0x220Bu, 0xCBE6u, 0x015Au, 0xD529u, 0x1F95u,
+ 0x8D44u, 0x47F8u, 0x938Bu, 0x5937u, 0xB0DAu, 0x7A66u, 0xAE15u, 0x64A9u,
+ 0x6747u, 0xADFBu, 0x7988u, 0xB334u, 0x5AD9u, 0x9065u, 0x4416u, 0x8EAAu,
+ 0x1C7Bu, 0xD6C7u, 0x02B4u, 0xC808u, 0x21E5u, 0xEB59u, 0x3F2Au, 0xF596u,
+ 0x913Fu, 0x5B83u, 0x8FF0u, 0x454Cu, 0xACA1u, 0x661Du, 0xB26Eu, 0x78D2u,
+ 0xEA03u, 0x20BFu, 0xF4CCu, 0x3E70u, 0xD79Du, 0x1D21u, 0xC952u, 0x03EEu,
+ 0xCE8Eu, 0x0432u, 0xD041u, 0x1AFDu, 0xF310u, 0x39ACu, 0xEDDFu, 0x2763u,
+ 0xB5B2u, 0x7F0Eu, 0xAB7Du, 0x61C1u, 0x882Cu, 0x4290u, 0x96E3u, 0x5C5Fu,
+ 0x38F6u, 0xF24Au, 0x2639u, 0xEC85u, 0x0568u, 0xCFD4u, 0x1BA7u, 0xD11Bu,
+ 0x43CAu, 0x8976u, 0x5D05u, 0x97B9u, 0x7E54u, 0xB4E8u, 0x609Bu, 0xAA27u,
+ 0xA9C9u, 0x6375u, 0xB706u, 0x7DBAu, 0x9457u, 0x5EEBu, 0x8A98u, 0x4024u,
+ 0xD2F5u, 0x1849u, 0xCC3Au, 0x0686u, 0xEF6Bu, 0x25D7u, 0xF1A4u, 0x3B18u,
+ 0x5FB1u, 0x950Du, 0x417Eu, 0x8BC2u, 0x622Fu, 0xA893u, 0x7CE0u, 0xB65Cu,
+ 0x248Du, 0xEE31u, 0x3A42u, 0xF0FEu, 0x1913u, 0xD3AFu, 0x07DCu, 0xCD60u,
+ 0x16ABu, 0xDC17u, 0x0864u, 0xC2D8u, 0x2B35u, 0xE189u, 0x35FAu, 0xFF46u,
+ 0x6D97u, 0xA72Bu, 0x7358u, 0xB9E4u, 0x5009u, 0x9AB5u, 0x4EC6u, 0x847Au,
+ 0xE0D3u, 0x2A6Fu, 0xFE1Cu, 0x34A0u, 0xDD4Du, 0x17F1u, 0xC382u, 0x093Eu,
+ 0x9BEFu, 0x5153u, 0x8520u, 0x4F9Cu, 0xA671u, 0x6CCDu, 0xB8BEu, 0x7202u,
+ 0x71ECu, 0xBB50u, 0x6F23u, 0xA59Fu, 0x4C72u, 0x86CEu, 0x52BDu, 0x9801u,
+ 0x0AD0u, 0xC06Cu, 0x141Fu, 0xDEA3u, 0x374Eu, 0xFDF2u, 0x2981u, 0xE33Du,
+ 0x8794u, 0x4D28u, 0x995Bu, 0x53E7u, 0xBA0Au, 0x70B6u, 0xA4C5u, 0x6E79u,
+ 0xFCA8u, 0x3614u, 0xE267u, 0x28DBu, 0xC136u, 0x0B8Au, 0xDFF9u, 0x1545u,
+ 0xD825u, 0x1299u, 0xC6EAu, 0x0C56u, 0xE5BBu, 0x2F07u, 0xFB74u, 0x31C8u,
+ 0xA319u, 0x69A5u, 0xBDD6u, 0x776Au, 0x9E87u, 0x543Bu, 0x8048u, 0x4AF4u,
+ 0x2E5Du, 0xE4E1u, 0x3092u, 0xFA2Eu, 0x13C3u, 0xD97Fu, 0x0D0Cu, 0xC7B0u,
+ 0x5561u, 0x9FDDu, 0x4BAEu, 0x8112u, 0x68FFu, 0xA243u, 0x7630u, 0xBC8Cu,
+ 0xBF62u, 0x75DEu, 0xA1ADu, 0x6B11u, 0x82FCu, 0x4840u, 0x9C33u, 0x568Fu,
+ 0xC45Eu, 0x0EE2u, 0xDA91u, 0x102Du, 0xF9C0u, 0x337Cu, 0xE70Fu, 0x2DB3u,
+ 0x491Au, 0x83A6u, 0x57D5u, 0x9D69u, 0x7484u, 0xBE38u, 0x6A4Bu, 0xA0F7u,
+ 0x3226u, 0xF89Au, 0x2CE9u, 0xE655u, 0x0FB8u, 0xC504u, 0x1177u, 0xDBCBu
+ },
+ {
+ 0x0000u, 0x2D56u, 0x5AACu, 0x77FAu, 0xB558u, 0x980Eu, 0xEFF4u, 0xC2A2u,
+ 0xE107u, 0xCC51u, 0xBBABu, 0x96FDu, 0x545Fu, 0x7909u, 0x0EF3u, 0x23A5u,
+ 0x49B9u, 0x64EFu, 0x1315u, 0x3E43u, 0xFCE1u, 0xD1B7u, 0xA64Du, 0x8B1Bu,
+ 0xA8BEu, 0x85E8u, 0xF212u, 0xDF44u, 0x1DE6u, 0x30B0u, 0x474Au, 0x6A1Cu,
+ 0x9372u, 0xBE24u, 0xC9DEu, 0xE488u, 0x262Au, 0x0B7Cu, 0x7C86u, 0x51D0u,
+ 0x7275u, 0x5F23u, 0x28D9u, 0x058Fu, 0xC72Du, 0xEA7Bu, 0x9D81u, 0xB0D7u,
+ 0xDACBu, 0xF79Du, 0x8067u, 0xAD31u, 0x6F93u, 0x42C5u, 0x353Fu, 0x1869u,
+ 0x3BCCu, 0x169Au, 0x6160u, 0x4C36u, 0x8E94u, 0xA3C2u, 0xD438u, 0xF96Eu,
+ 0xAD53u, 0x8005u, 0xF7FFu, 0xDAA9u, 0x180Bu, 0x355Du, 0x42A7u, 0x6FF1u,
+ 0x4C54u, 0x6102u, 0x16F8u, 0x3BAEu, 0xF90Cu, 0xD45Au, 0xA3A0u, 0x8EF6u,
+ 0xE4EAu, 0xC9BCu, 0xBE46u, 0x9310u, 0x51B2u, 0x7CE4u, 0x0B1Eu, 0x2648u,
+ 0x05EDu, 0x28BBu, 0x5F41u, 0x7217u, 0xB0B5u, 0x9DE3u, 0xEA19u, 0xC74Fu,
+ 0x3E21u, 0x1377u, 0x648Du, 0x49DBu, 0x8B79u, 0xA62Fu, 0xD1D5u, 0xFC83u,
+ 0xDF26u, 0xF270u, 0x858Au, 0xA8DCu, 0x6A7Eu, 0x4728u, 0x30D2u, 0x1D84u,
+ 0x7798u, 0x5ACEu, 0x2D34u, 0x0062u, 0xC2C0u, 0xEF96u, 0x986Cu, 0xB53Au,
+ 0x969Fu, 0xBBC9u, 0xCC33u, 0xE165u, 0x23C7u, 0x0E91u, 0x796Bu, 0x543Du,
+ 0xD111u, 0xFC47u, 0x8BBDu, 0xA6EBu, 0x6449u, 0x491Fu, 0x3EE5u, 0x13B3u,
+ 0x3016u, 0x1D40u, 0x6ABAu, 0x47ECu, 0x854Eu, 0xA818u, 0xDFE2u, 0xF2B4u,
+ 0x98A8u, 0xB5FEu, 0xC204u, 0xEF52u, 0x2DF0u, 0x00A6u, 0x775Cu, 0x5A0Au,
+ 0x79AFu, 0x54F9u, 0x2303u, 0x0E55u, 0xCCF7u, 0xE1A1u, 0x965Bu, 0xBB0Du,
+ 0x4263u, 0x6F35u, 0x18CFu, 0x3599u, 0xF73Bu, 0xDA6Du, 0xAD97u, 0x80C1u,
+ 0xA364u, 0x8E32u, 0xF9C8u, 0xD49Eu, 0x163Cu, 0x3B6Au, 0x4C90u, 0x61C6u,
+ 0x0BDAu, 0x268Cu, 0x5176u, 0x7C20u, 0xBE82u, 0x93D4u, 0xE42Eu, 0xC978u,
+ 0xEADDu, 0xC78Bu, 0xB071u, 0x9D27u, 0x5F85u, 0x72D3u, 0x0529u, 0x287Fu,
+ 0x7C42u, 0x5114u, 0x26EEu, 0x0BB8u, 0xC91Au, 0xE44Cu, 0x93B6u, 0xBEE0u,
+ 0x9D45u, 0xB013u, 0xC7E9u, 0xEABFu, 0x281Du, 0x054Bu, 0x72B1u, 0x5FE7u,
+ 0x35FBu, 0x18ADu, 0x6F57u, 0x4201u, 0x80A3u, 0xADF5u, 0xDA0Fu, 0xF759u,
+ 0xD4FCu, 0xF9AAu, 0x8E50u, 0xA306u, 0x61A4u, 0x4CF2u, 0x3B08u, 0x165Eu,
+ 0xEF30u, 0xC266u, 0xB59Cu, 0x98CAu, 0x5A68u, 0x773Eu, 0x00C4u, 0x2D92u,
+ 0x0E37u, 0x2361u, 0x549Bu, 0x79CDu, 0xBB6Fu, 0x9639u, 0xE1C3u, 0xCC95u,
+ 0xA689u, 0x8BDFu, 0xFC25u, 0xD173u, 0x13D1u, 0x3E87u, 0x497Du, 0x642Bu,
+ 0x478Eu, 0x6AD8u, 0x1D22u, 0x3074u, 0xF2D6u, 0xDF80u, 0xA87Au, 0x852Cu
+ },
+ {
+ 0x0000u, 0x2995u, 0x532Au, 0x7ABFu, 0xA654u, 0x8FC1u, 0xF57Eu, 0xDCEBu,
+ 0xC71Fu, 0xEE8Au, 0x9435u, 0xBDA0u, 0x614Bu, 0x48DEu, 0x3261u, 0x1BF4u,
+ 0x0589u, 0x2C1Cu, 0x56A3u, 0x7F36u, 0xA3DDu, 0x8A48u, 0xF0F7u, 0xD962u,
+ 0xC296u, 0xEB03u, 0x91BCu, 0xB829u, 0x64C2u, 0x4D57u, 0x37E8u, 0x1E7Du,
+ 0x0B12u, 0x2287u, 0x5838u, 0x71ADu, 0xAD46u, 0x84D3u, 0xFE6Cu, 0xD7F9u,
+ 0xCC0Du, 0xE598u, 0x9F27u, 0xB6B2u, 0x6A59u, 0x43CCu, 0x3973u, 0x10E6u,
+ 0x0E9Bu, 0x270Eu, 0x5DB1u, 0x7424u, 0xA8CFu, 0x815Au, 0xFBE5u, 0xD270u,
+ 0xC984u, 0xE011u, 0x9AAEu, 0xB33Bu, 0x6FD0u, 0x4645u, 0x3CFAu, 0x156Fu,
+ 0x1624u, 0x3FB1u, 0x450Eu, 0x6C9Bu, 0xB070u, 0x99E5u, 0xE35Au, 0xCACFu,
+ 0xD13Bu, 0xF8AEu, 0x8211u, 0xAB84u, 0x776Fu, 0x5EFAu, 0x2445u, 0x0DD0u,
+ 0x13ADu, 0x3A38u, 0x4087u, 0x6912u, 0xB5F9u, 0x9C6Cu, 0xE6D3u, 0xCF46u,
+ 0xD4B2u, 0xFD27u, 0x8798u, 0xAE0Du, 0x72E6u, 0x5B73u, 0x21CCu, 0x0859u,
+ 0x1D36u, 0x34A3u, 0x4E1Cu, 0x6789u, 0xBB62u, 0x92F7u, 0xE848u, 0xC1DDu,
+ 0xDA29u, 0xF3BCu, 0x8903u, 0xA096u, 0x7C7Du, 0x55E8u, 0x2F57u, 0x06C2u,
+ 0x18BFu, 0x312Au, 0x4B95u, 0x6200u, 0xBEEBu, 0x977Eu, 0xEDC1u, 0xC454u,
+ 0xDFA0u, 0xF635u, 0x8C8Au, 0xA51Fu, 0x79F4u, 0x5061u, 0x2ADEu, 0x034Bu,
+ 0x2C48u, 0x05DDu, 0x7F62u, 0x56F7u, 0x8A1Cu, 0xA389u, 0xD936u, 0xF0A3u,
+ 0xEB57u, 0xC2C2u, 0xB87Du, 0x91E8u, 0x4D03u, 0x6496u, 0x1E29u, 0x37BCu,
+ 0x29C1u, 0x0054u, 0x7AEBu, 0x537Eu, 0x8F95u, 0xA600u, 0xDCBFu, 0xF52Au,
+ 0xEEDEu, 0xC74Bu, 0xBDF4u, 0x9461u, 0x488Au, 0x611Fu, 0x1BA0u, 0x3235u,
+ 0x275Au, 0x0ECFu, 0x7470u, 0x5DE5u, 0x810Eu, 0xA89Bu, 0xD224u, 0xFBB1u,
+ 0xE045u, 0xC9D0u, 0xB36Fu, 0x9AFAu, 0x4611u, 0x6F84u, 0x153Bu, 0x3CAEu,
+ 0x22D3u, 0x0B46u, 0x71F9u, 0x586Cu, 0x8487u, 0xAD12u, 0xD7ADu, 0xFE38u,
+ 0xE5CCu, 0xCC59u, 0xB6E6u, 0x9F73u, 0x4398u, 0x6A0Du, 0x10B2u, 0x3927u,
+ 0x3A6Cu, 0x13F9u, 0x6946u, 0x40D3u, 0x9C38u, 0xB5ADu, 0xCF12u, 0xE687u,
+ 0xFD73u, 0xD4E6u, 0xAE59u, 0x87CCu, 0x5B27u, 0x72B2u, 0x080Du, 0x2198u,
+ 0x3FE5u, 0x1670u, 0x6CCFu, 0x455Au, 0x99B1u, 0xB024u, 0xCA9Bu, 0xE30Eu,
+ 0xF8FAu, 0xD16Fu, 0xABD0u, 0x8245u, 0x5EAEu, 0x773Bu, 0x0D84u, 0x2411u,
+ 0x317Eu, 0x18EBu, 0x6254u, 0x4BC1u, 0x972Au, 0xBEBFu, 0xC400u, 0xED95u,
+ 0xF661u, 0xDFF4u, 0xA54Bu, 0x8CDEu, 0x5035u, 0x79A0u, 0x031Fu, 0x2A8Au,
+ 0x34F7u, 0x1D62u, 0x67DDu, 0x4E48u, 0x92A3u, 0xBB36u, 0xC189u, 0xE81Cu,
+ 0xF3E8u, 0xDA7Du, 0xA0C2u, 0x8957u, 0x55BCu, 0x7C29u, 0x0696u, 0x2F03u
+ },
+ {
+ 0x0000u, 0x5890u, 0xB120u, 0xE9B0u, 0xE9F7u, 0xB167u, 0x58D7u, 0x0047u,
+ 0x5859u, 0x00C9u, 0xE979u, 0xB1E9u, 0xB1AEu, 0xE93Eu, 0x008Eu, 0x581Eu,
+ 0xB0B2u, 0xE822u, 0x0192u, 0x5902u, 0x5945u, 0x01D5u, 0xE865u, 0xB0F5u,
+ 0xE8EBu, 0xB07Bu, 0x59CBu, 0x015Bu, 0x011Cu, 0x598Cu, 0xB03Cu, 0xE8ACu,
+ 0xEAD3u, 0xB243u, 0x5BF3u, 0x0363u, 0x0324u, 0x5BB4u, 0xB204u, 0xEA94u,
+ 0xB28Au, 0xEA1Au, 0x03AAu, 0x5B3Au, 0x5B7Du, 0x03EDu, 0xEA5Du, 0xB2CDu,
+ 0x5A61u, 0x02F1u, 0xEB41u, 0xB3D1u, 0xB396u, 0xEB06u, 0x02B6u, 0x5A26u,
+ 0x0238u, 0x5AA8u, 0xB318u, 0xEB88u, 0xEBCFu, 0xB35Fu, 0x5AEFu, 0x027Fu,
+ 0x5E11u, 0x0681u, 0xEF31u, 0xB7A1u, 0xB7E6u, 0xEF76u, 0x06C6u, 0x5E56u,
+ 0x0648u, 0x5ED8u, 0xB768u, 0xEFF8u, 0xEFBFu, 0xB72Fu, 0x5E9Fu, 0x060Fu,
+ 0xEEA3u, 0xB633u, 0x5F83u, 0x0713u, 0x0754u, 0x5FC4u, 0xB674u, 0xEEE4u,
+ 0xB6FAu, 0xEE6Au, 0x07DAu, 0x5F4Au, 0x5F0Du, 0x079Du, 0xEE2Du, 0xB6BDu,
+ 0xB4C2u, 0xEC52u, 0x05E2u, 0x5D72u, 0x5D35u, 0x05A5u, 0xEC15u, 0xB485u,
+ 0xEC9Bu, 0xB40Bu, 0x5DBBu, 0x052Bu, 0x056Cu, 0x5DFCu, 0xB44Cu, 0xECDCu,
+ 0x0470u, 0x5CE0u, 0xB550u, 0xEDC0u, 0xED87u, 0xB517u, 0x5CA7u, 0x0437u,
+ 0x5C29u, 0x04B9u, 0xED09u, 0xB599u, 0xB5DEu, 0xED4Eu, 0x04FEu, 0x5C6Eu,
+ 0xBC22u, 0xE4B2u, 0x0D02u, 0x5592u, 0x55D5u, 0x0D45u, 0xE4F5u, 0xBC65u,
+ 0xE47Bu, 0xBCEBu, 0x555Bu, 0x0DCBu, 0x0D8Cu, 0x551Cu, 0xBCACu, 0xE43Cu,
+ 0x0C90u, 0x5400u, 0xBDB0u, 0xE520u, 0xE567u, 0xBDF7u, 0x5447u, 0x0CD7u,
+ 0x54C9u, 0x0C59u, 0xE5E9u, 0xBD79u, 0xBD3Eu, 0xE5AEu, 0x0C1Eu, 0x548Eu,
+ 0x56F1u, 0x0E61u, 0xE7D1u, 0xBF41u, 0xBF06u, 0xE796u, 0x0E26u, 0x56B6u,
+ 0x0EA8u, 0x5638u, 0xBF88u, 0xE718u, 0xE75Fu, 0xBFCFu, 0x567Fu, 0x0EEFu,
+ 0xE643u, 0xBED3u, 0x5763u, 0x0FF3u, 0x0FB4u, 0x5724u, 0xBE94u, 0xE604u,
+ 0xBE1Au, 0xE68Au, 0x0F3Au, 0x57AAu, 0x57EDu, 0x0F7Du, 0xE6CDu, 0xBE5Du,
+ 0xE233u, 0xBAA3u, 0x5313u, 0x0B83u, 0x0BC4u, 0x5354u, 0xBAE4u, 0xE274u,
+ 0xBA6Au, 0xE2FAu, 0x0B4Au, 0x53DAu, 0x539Du, 0x0B0Du, 0xE2BDu, 0xBA2Du,
+ 0x5281u, 0x0A11u, 0xE3A1u, 0xBB31u, 0xBB76u, 0xE3E6u, 0x0A56u, 0x52C6u,
+ 0x0AD8u, 0x5248u, 0xBBF8u, 0xE368u, 0xE32Fu, 0xBBBFu, 0x520Fu, 0x0A9Fu,
+ 0x08E0u, 0x5070u, 0xB9C0u, 0xE150u, 0xE117u, 0xB987u, 0x5037u, 0x08A7u,
+ 0x50B9u, 0x0829u, 0xE199u, 0xB909u, 0xB94Eu, 0xE1DEu, 0x086Eu, 0x50FEu,
+ 0xB852u, 0xE0C2u, 0x0972u, 0x51E2u, 0x51A5u, 0x0935u, 0xE085u, 0xB815u,
+ 0xE00Bu, 0xB89Bu, 0x512Bu, 0x09BBu, 0x09FCu, 0x516Cu, 0xB8DCu, 0xE04Cu
+ },
+ {
+ 0x0000u, 0xF3F3u, 0x6C51u, 0x9FA2u, 0xD8A2u, 0x2B51u, 0xB4F3u, 0x4700u,
+ 0x3AF3u, 0xC900u, 0x56A2u, 0xA551u, 0xE251u, 0x11A2u, 0x8E00u, 0x7DF3u,
+ 0x75E6u, 0x8615u, 0x19B7u, 0xEA44u, 0xAD44u, 0x5EB7u, 0xC115u, 0x32E6u,
+ 0x4F15u, 0xBCE6u, 0x2344u, 0xD0B7u, 0x97B7u, 0x6444u, 0xFBE6u, 0x0815u,
+ 0xEBCCu, 0x183Fu, 0x879Du, 0x746Eu, 0x336Eu, 0xC09Du, 0x5F3Fu, 0xACCCu,
+ 0xD13Fu, 0x22CCu, 0xBD6Eu, 0x4E9Du, 0x099Du, 0xFA6Eu, 0x65CCu, 0x963Fu,
+ 0x9E2Au, 0x6DD9u, 0xF27Bu, 0x0188u, 0x4688u, 0xB57Bu, 0x2AD9u, 0xD92Au,
+ 0xA4D9u, 0x572Au, 0xC888u, 0x3B7Bu, 0x7C7Bu, 0x8F88u, 0x102Au, 0xE3D9u,
+ 0x5C2Fu, 0xAFDCu, 0x307Eu, 0xC38Du, 0x848Du, 0x777Eu, 0xE8DCu, 0x1B2Fu,
+ 0x66DCu, 0x952Fu, 0x0A8Du, 0xF97Eu, 0xBE7Eu, 0x4D8Du, 0xD22Fu, 0x21DCu,
+ 0x29C9u, 0xDA3Au, 0x4598u, 0xB66Bu, 0xF16Bu, 0x0298u, 0x9D3Au, 0x6EC9u,
+ 0x133Au, 0xE0C9u, 0x7F6Bu, 0x8C98u, 0xCB98u, 0x386Bu, 0xA7C9u, 0x543Au,
+ 0xB7E3u, 0x4410u, 0xDBB2u, 0x2841u, 0x6F41u, 0x9CB2u, 0x0310u, 0xF0E3u,
+ 0x8D10u, 0x7EE3u, 0xE141u, 0x12B2u, 0x55B2u, 0xA641u, 0x39E3u, 0xCA10u,
+ 0xC205u, 0x31F6u, 0xAE54u, 0x5DA7u, 0x1AA7u, 0xE954u, 0x76F6u, 0x8505u,
+ 0xF8F6u, 0x0B05u, 0x94A7u, 0x6754u, 0x2054u, 0xD3A7u, 0x4C05u, 0xBFF6u,
+ 0xB85Eu, 0x4BADu, 0xD40Fu, 0x27FCu, 0x60FCu, 0x930Fu, 0x0CADu, 0xFF5Eu,
+ 0x82ADu, 0x715Eu, 0xEEFCu, 0x1D0Fu, 0x5A0Fu, 0xA9FCu, 0x365Eu, 0xC5ADu,
+ 0xCDB8u, 0x3E4Bu, 0xA1E9u, 0x521Au, 0x151Au, 0xE6E9u, 0x794Bu, 0x8AB8u,
+ 0xF74Bu, 0x04B8u, 0x9B1Au, 0x68E9u, 0x2FE9u, 0xDC1Au, 0x43B8u, 0xB04Bu,
+ 0x5392u, 0xA061u, 0x3FC3u, 0xCC30u, 0x8B30u, 0x78C3u, 0xE761u, 0x1492u,
+ 0x6961u, 0x9A92u, 0x0530u, 0xF6C3u, 0xB1C3u, 0x4230u, 0xDD92u, 0x2E61u,
+ 0x2674u, 0xD587u, 0x4A25u, 0xB9D6u, 0xFED6u, 0x0D25u, 0x9287u, 0x6174u,
+ 0x1C87u, 0xEF74u, 0x70D6u, 0x8325u, 0xC425u, 0x37D6u, 0xA874u, 0x5B87u,
+ 0xE471u, 0x1782u, 0x8820u, 0x7BD3u, 0x3CD3u, 0xCF20u, 0x5082u, 0xA371u,
+ 0xDE82u, 0x2D71u, 0xB2D3u, 0x4120u, 0x0620u, 0xF5D3u, 0x6A71u, 0x9982u,
+ 0x9197u, 0x6264u, 0xFDC6u, 0x0E35u, 0x4935u, 0xBAC6u, 0x2564u, 0xD697u,
+ 0xAB64u, 0x5897u, 0xC735u, 0x34C6u, 0x73C6u, 0x8035u, 0x1F97u, 0xEC64u,
+ 0x0FBDu, 0xFC4Eu, 0x63ECu, 0x901Fu, 0xD71Fu, 0x24ECu, 0xBB4Eu, 0x48BDu,
+ 0x354Eu, 0xC6BDu, 0x591Fu, 0xAAECu, 0xEDECu, 0x1E1Fu, 0x81BDu, 0x724Eu,
+ 0x7A5Bu, 0x89A8u, 0x160Au, 0xE5F9u, 0xA2F9u, 0x510Au, 0xCEA8u, 0x3D5Bu,
+ 0x40A8u, 0xB35Bu, 0x2CF9u, 0xDF0Au, 0x980Au, 0x6BF9u, 0xF45Bu, 0x07A8u
+ },
+ {
+ 0x0000u, 0xFB0Bu, 0x7DA1u, 0x86AAu, 0xFB42u, 0x0049u, 0x86E3u, 0x7DE8u,
+ 0x7D33u, 0x8638u, 0x0092u, 0xFB99u, 0x8671u, 0x7D7Au, 0xFBD0u, 0x00DBu,
+ 0xFA66u, 0x016Du, 0x87C7u, 0x7CCCu, 0x0124u, 0xFA2Fu, 0x7C85u, 0x878Eu,
+ 0x8755u, 0x7C5Eu, 0xFAF4u, 0x01FFu, 0x7C17u, 0x871Cu, 0x01B6u, 0xFABDu,
+ 0x7F7Bu, 0x8470u, 0x02DAu, 0xF9D1u, 0x8439u, 0x7F32u, 0xF998u, 0x0293u,
+ 0x0248u, 0xF943u, 0x7FE9u, 0x84E2u, 0xF90Au, 0x0201u, 0x84ABu, 0x7FA0u,
+ 0x851Du, 0x7E16u, 0xF8BCu, 0x03B7u, 0x7E5Fu, 0x8554u, 0x03FEu, 0xF8F5u,
+ 0xF82Eu, 0x0325u, 0x858Fu, 0x7E84u, 0x036Cu, 0xF867u, 0x7ECDu, 0x85C6u,
+ 0xFEF6u, 0x05FDu, 0x8357u, 0x785Cu, 0x05B4u, 0xFEBFu, 0x7815u, 0x831Eu,
+ 0x83C5u, 0x78CEu, 0xFE64u, 0x056Fu, 0x7887u, 0x838Cu, 0x0526u, 0xFE2Du,
+ 0x0490u, 0xFF9Bu, 0x7931u, 0x823Au, 0xFFD2u, 0x04D9u, 0x8273u, 0x7978u,
+ 0x79A3u, 0x82A8u, 0x0402u, 0xFF09u, 0x82E1u, 0x79EAu, 0xFF40u, 0x044Bu,
+ 0x818Du, 0x7A86u, 0xFC2Cu, 0x0727u, 0x7ACFu, 0x81C4u, 0x076Eu, 0xFC65u,
+ 0xFCBEu, 0x07B5u, 0x811Fu, 0x7A14u, 0x07FCu, 0xFCF7u, 0x7A5Du, 0x8156u,
+ 0x7BEBu, 0x80E0u, 0x064Au, 0xFD41u, 0x80A9u, 0x7BA2u, 0xFD08u, 0x0603u,
+ 0x06D8u, 0xFDD3u, 0x7B79u, 0x8072u, 0xFD9Au, 0x0691u, 0x803Bu, 0x7B30u,
+ 0x765Bu, 0x8D50u, 0x0BFAu, 0xF0F1u, 0x8D19u, 0x7612u, 0xF0B8u, 0x0BB3u,
+ 0x0B68u, 0xF063u, 0x76C9u, 0x8DC2u, 0xF02Au, 0x0B21u, 0x8D8Bu, 0x7680u,
+ 0x8C3Du, 0x7736u, 0xF19Cu, 0x0A97u, 0x777Fu, 0x8C74u, 0x0ADEu, 0xF1D5u,
+ 0xF10Eu, 0x0A05u, 0x8CAFu, 0x77A4u, 0x0A4Cu, 0xF147u, 0x77EDu, 0x8CE6u,
+ 0x0920u, 0xF22Bu, 0x7481u, 0x8F8Au, 0xF262u, 0x0969u, 0x8FC3u, 0x74C8u,
+ 0x7413u, 0x8F18u, 0x09B2u, 0xF2B9u, 0x8F51u, 0x745Au, 0xF2F0u, 0x09FBu,
+ 0xF346u, 0x084Du, 0x8EE7u, 0x75ECu, 0x0804u, 0xF30Fu, 0x75A5u, 0x8EAEu,
+ 0x8E75u, 0x757Eu, 0xF3D4u, 0x08DFu, 0x7537u, 0x8E3Cu, 0x0896u, 0xF39Du,
+ 0x88ADu, 0x73A6u, 0xF50Cu, 0x0E07u, 0x73EFu, 0x88E4u, 0x0E4Eu, 0xF545u,
+ 0xF59Eu, 0x0E95u, 0x883Fu, 0x7334u, 0x0EDCu, 0xF5D7u, 0x737Du, 0x8876u,
+ 0x72CBu, 0x89C0u, 0x0F6Au, 0xF461u, 0x8989u, 0x7282u, 0xF428u, 0x0F23u,
+ 0x0FF8u, 0xF4F3u, 0x7259u, 0x8952u, 0xF4BAu, 0x0FB1u, 0x891Bu, 0x7210u,
+ 0xF7D6u, 0x0CDDu, 0x8A77u, 0x717Cu, 0x0C94u, 0xF79Fu, 0x7135u, 0x8A3Eu,
+ 0x8AE5u, 0x71EEu, 0xF744u, 0x0C4Fu, 0x71A7u, 0x8AACu, 0x0C06u, 0xF70Du,
+ 0x0DB0u, 0xF6BBu, 0x7011u, 0x8B1Au, 0xF6F2u, 0x0DF9u, 0x8B53u, 0x7058u,
+ 0x7083u, 0x8B88u, 0x0D22u, 0xF629u, 0x8BC1u, 0x70CAu, 0xF660u, 0x0D6Bu
+ },
+ {
+ 0x0000u, 0xECB6u, 0x52DBu, 0xBE6Du, 0xA5B6u, 0x4900u, 0xF76Du, 0x1BDBu,
+ 0xC0DBu, 0x2C6Du, 0x9200u, 0x7EB6u, 0x656Du, 0x89DBu, 0x37B6u, 0xDB00u,
+ 0x0A01u, 0xE6B7u, 0x58DAu, 0xB46Cu, 0xAFB7u, 0x4301u, 0xFD6Cu, 0x11DAu,
+ 0xCADAu, 0x266Cu, 0x9801u, 0x74B7u, 0x6F6Cu, 0x83DAu, 0x3DB7u, 0xD101u,
+ 0x1402u, 0xF8B4u, 0x46D9u, 0xAA6Fu, 0xB1B4u, 0x5D02u, 0xE36Fu, 0x0FD9u,
+ 0xD4D9u, 0x386Fu, 0x8602u, 0x6AB4u, 0x716Fu, 0x9DD9u, 0x23B4u, 0xCF02u,
+ 0x1E03u, 0xF2B5u, 0x4CD8u, 0xA06Eu, 0xBBB5u, 0x5703u, 0xE96Eu, 0x05D8u,
+ 0xDED8u, 0x326Eu, 0x8C03u, 0x60B5u, 0x7B6Eu, 0x97D8u, 0x29B5u, 0xC503u,
+ 0x2804u, 0xC4B2u, 0x7ADFu, 0x9669u, 0x8DB2u, 0x6104u, 0xDF69u, 0x33DFu,
+ 0xE8DFu, 0x0469u, 0xBA04u, 0x56B2u, 0x4D69u, 0xA1DFu, 0x1FB2u, 0xF304u,
+ 0x2205u, 0xCEB3u, 0x70DEu, 0x9C68u, 0x87B3u, 0x6B05u, 0xD568u, 0x39DEu,
+ 0xE2DEu, 0x0E68u, 0xB005u, 0x5CB3u, 0x4768u, 0xABDEu, 0x15B3u, 0xF905u,
+ 0x3C06u, 0xD0B0u, 0x6EDDu, 0x826Bu, 0x99B0u, 0x7506u, 0xCB6Bu, 0x27DDu,
+ 0xFCDDu, 0x106Bu, 0xAE06u, 0x42B0u, 0x596Bu, 0xB5DDu, 0x0BB0u, 0xE706u,
+ 0x3607u, 0xDAB1u, 0x64DCu, 0x886Au, 0x93B1u, 0x7F07u, 0xC16Au, 0x2DDCu,
+ 0xF6DCu, 0x1A6Au, 0xA407u, 0x48B1u, 0x536Au, 0xBFDCu, 0x01B1u, 0xED07u,
+ 0x5008u, 0xBCBEu, 0x02D3u, 0xEE65u, 0xF5BEu, 0x1908u, 0xA765u, 0x4BD3u,
+ 0x90D3u, 0x7C65u, 0xC208u, 0x2EBEu, 0x3565u, 0xD9D3u, 0x67BEu, 0x8B08u,
+ 0x5A09u, 0xB6BFu, 0x08D2u, 0xE464u, 0xFFBFu, 0x1309u, 0xAD64u, 0x41D2u,
+ 0x9AD2u, 0x7664u, 0xC809u, 0x24BFu, 0x3F64u, 0xD3D2u, 0x6DBFu, 0x8109u,
+ 0x440Au, 0xA8BCu, 0x16D1u, 0xFA67u, 0xE1BCu, 0x0D0Au, 0xB367u, 0x5FD1u,
+ 0x84D1u, 0x6867u, 0xD60Au, 0x3ABCu, 0x2167u, 0xCDD1u, 0x73BCu, 0x9F0Au,
+ 0x4E0Bu, 0xA2BDu, 0x1CD0u, 0xF066u, 0xEBBDu, 0x070Bu, 0xB966u, 0x55D0u,
+ 0x8ED0u, 0x6266u, 0xDC0Bu, 0x30BDu, 0x2B66u, 0xC7D0u, 0x79BDu, 0x950Bu,
+ 0x780Cu, 0x94BAu, 0x2AD7u, 0xC661u, 0xDDBAu, 0x310Cu, 0x8F61u, 0x63D7u,
+ 0xB8D7u, 0x5461u, 0xEA0Cu, 0x06BAu, 0x1D61u, 0xF1D7u, 0x4FBAu, 0xA30Cu,
+ 0x720Du, 0x9EBBu, 0x20D6u, 0xCC60u, 0xD7BBu, 0x3B0Du, 0x8560u, 0x69D6u,
+ 0xB2D6u, 0x5E60u, 0xE00Du, 0x0CBBu, 0x1760u, 0xFBD6u, 0x45BBu, 0xA90Du,
+ 0x6C0Eu, 0x80B8u, 0x3ED5u, 0xD263u, 0xC9B8u, 0x250Eu, 0x9B63u, 0x77D5u,
+ 0xACD5u, 0x4063u, 0xFE0Eu, 0x12B8u, 0x0963u, 0xE5D5u, 0x5BB8u, 0xB70Eu,
+ 0x660Fu, 0x8AB9u, 0x34D4u, 0xD862u, 0xC3B9u, 0x2F0Fu, 0x9162u, 0x7DD4u,
+ 0xA6D4u, 0x4A62u, 0xF40Fu, 0x18B9u, 0x0362u, 0xEFD4u, 0x51B9u, 0xBD0Fu
+ },
+ {
+ 0x0000u, 0xA010u, 0xCB97u, 0x6B87u, 0x1C99u, 0xBC89u, 0xD70Eu, 0x771Eu,
+ 0x3932u, 0x9922u, 0xF2A5u, 0x52B5u, 0x25ABu, 0x85BBu, 0xEE3Cu, 0x4E2Cu,
+ 0x7264u, 0xD274u, 0xB9F3u, 0x19E3u, 0x6EFDu, 0xCEEDu, 0xA56Au, 0x057Au,
+ 0x4B56u, 0xEB46u, 0x80C1u, 0x20D1u, 0x57CFu, 0xF7DFu, 0x9C58u, 0x3C48u,
+ 0xE4C8u, 0x44D8u, 0x2F5Fu, 0x8F4Fu, 0xF851u, 0x5841u, 0x33C6u, 0x93D6u,
+ 0xDDFAu, 0x7DEAu, 0x166Du, 0xB67Du, 0xC163u, 0x6173u, 0x0AF4u, 0xAAE4u,
+ 0x96ACu, 0x36BCu, 0x5D3Bu, 0xFD2Bu, 0x8A35u, 0x2A25u, 0x41A2u, 0xE1B2u,
+ 0xAF9Eu, 0x0F8Eu, 0x6409u, 0xC419u, 0xB307u, 0x1317u, 0x7890u, 0xD880u,
+ 0x4227u, 0xE237u, 0x89B0u, 0x29A0u, 0x5EBEu, 0xFEAEu, 0x9529u, 0x3539u,
+ 0x7B15u, 0xDB05u, 0xB082u, 0x1092u, 0x678Cu, 0xC79Cu, 0xAC1Bu, 0x0C0Bu,
+ 0x3043u, 0x9053u, 0xFBD4u, 0x5BC4u, 0x2CDAu, 0x8CCAu, 0xE74Du, 0x475Du,
+ 0x0971u, 0xA961u, 0xC2E6u, 0x62F6u, 0x15E8u, 0xB5F8u, 0xDE7Fu, 0x7E6Fu,
+ 0xA6EFu, 0x06FFu, 0x6D78u, 0xCD68u, 0xBA76u, 0x1A66u, 0x71E1u, 0xD1F1u,
+ 0x9FDDu, 0x3FCDu, 0x544Au, 0xF45Au, 0x8344u, 0x2354u, 0x48D3u, 0xE8C3u,
+ 0xD48Bu, 0x749Bu, 0x1F1Cu, 0xBF0Cu, 0xC812u, 0x6802u, 0x0385u, 0xA395u,
+ 0xEDB9u, 0x4DA9u, 0x262Eu, 0x863Eu, 0xF120u, 0x5130u, 0x3AB7u, 0x9AA7u,
+ 0x844Eu, 0x245Eu, 0x4FD9u, 0xEFC9u, 0x98D7u, 0x38C7u, 0x5340u, 0xF350u,
+ 0xBD7Cu, 0x1D6Cu, 0x76EBu, 0xD6FBu, 0xA1E5u, 0x01F5u, 0x6A72u, 0xCA62u,
+ 0xF62Au, 0x563Au, 0x3DBDu, 0x9DADu, 0xEAB3u, 0x4AA3u, 0x2124u, 0x8134u,
+ 0xCF18u, 0x6F08u, 0x048Fu, 0xA49Fu, 0xD381u, 0x7391u, 0x1816u, 0xB806u,
+ 0x6086u, 0xC096u, 0xAB11u, 0x0B01u, 0x7C1Fu, 0xDC0Fu, 0xB788u, 0x1798u,
+ 0x59B4u, 0xF9A4u, 0x9223u, 0x3233u, 0x452Du, 0xE53Du, 0x8EBAu, 0x2EAAu,
+ 0x12E2u, 0xB2F2u, 0xD975u, 0x7965u, 0x0E7Bu, 0xAE6Bu, 0xC5ECu, 0x65FCu,
+ 0x2BD0u, 0x8BC0u, 0xE047u, 0x4057u, 0x3749u, 0x9759u, 0xFCDEu, 0x5CCEu,
+ 0xC669u, 0x6679u, 0x0DFEu, 0xADEEu, 0xDAF0u, 0x7AE0u, 0x1167u, 0xB177u,
+ 0xFF5Bu, 0x5F4Bu, 0x34CCu, 0x94DCu, 0xE3C2u, 0x43D2u, 0x2855u, 0x8845u,
+ 0xB40Du, 0x141Du, 0x7F9Au, 0xDF8Au, 0xA894u, 0x0884u, 0x6303u, 0xC313u,
+ 0x8D3Fu, 0x2D2Fu, 0x46A8u, 0xE6B8u, 0x91A6u, 0x31B6u, 0x5A31u, 0xFA21u,
+ 0x22A1u, 0x82B1u, 0xE936u, 0x4926u, 0x3E38u, 0x9E28u, 0xF5AFu, 0x55BFu,
+ 0x1B93u, 0xBB83u, 0xD004u, 0x7014u, 0x070Au, 0xA71Au, 0xCC9Du, 0x6C8Du,
+ 0x50C5u, 0xF0D5u, 0x9B52u, 0x3B42u, 0x4C5Cu, 0xEC4Cu, 0x87CBu, 0x27DBu,
+ 0x69F7u, 0xC9E7u, 0xA260u, 0x0270u, 0x756Eu, 0xD57Eu, 0xBEF9u, 0x1EE9u
+ },
+ {
+ 0x0000u, 0x832Bu, 0x8DE1u, 0x0ECAu, 0x9075u, 0x135Eu, 0x1D94u, 0x9EBFu,
+ 0xAB5Du, 0x2876u, 0x26BCu, 0xA597u, 0x3B28u, 0xB803u, 0xB6C9u, 0x35E2u,
+ 0xDD0Du, 0x5E26u, 0x50ECu, 0xD3C7u, 0x4D78u, 0xCE53u, 0xC099u, 0x43B2u,
+ 0x7650u, 0xF57Bu, 0xFBB1u, 0x789Au, 0xE625u, 0x650Eu, 0x6BC4u, 0xE8EFu,
+ 0x31ADu, 0xB286u, 0xBC4Cu, 0x3F67u, 0xA1D8u, 0x22F3u, 0x2C39u, 0xAF12u,
+ 0x9AF0u, 0x19DBu, 0x1711u, 0x943Au, 0x0A85u, 0x89AEu, 0x8764u, 0x044Fu,
+ 0xECA0u, 0x6F8Bu, 0x6141u, 0xE26Au, 0x7CD5u, 0xFFFEu, 0xF134u, 0x721Fu,
+ 0x47FDu, 0xC4D6u, 0xCA1Cu, 0x4937u, 0xD788u, 0x54A3u, 0x5A69u, 0xD942u,
+ 0x635Au, 0xE071u, 0xEEBBu, 0x6D90u, 0xF32Fu, 0x7004u, 0x7ECEu, 0xFDE5u,
+ 0xC807u, 0x4B2Cu, 0x45E6u, 0xC6CDu, 0x5872u, 0xDB59u, 0xD593u, 0x56B8u,
+ 0xBE57u, 0x3D7Cu, 0x33B6u, 0xB09Du, 0x2E22u, 0xAD09u, 0xA3C3u, 0x20E8u,
+ 0x150Au, 0x9621u, 0x98EBu, 0x1BC0u, 0x857Fu, 0x0654u, 0x089Eu, 0x8BB5u,
+ 0x52F7u, 0xD1DCu, 0xDF16u, 0x5C3Du, 0xC282u, 0x41A9u, 0x4F63u, 0xCC48u,
+ 0xF9AAu, 0x7A81u, 0x744Bu, 0xF760u, 0x69DFu, 0xEAF4u, 0xE43Eu, 0x6715u,
+ 0x8FFAu, 0x0CD1u, 0x021Bu, 0x8130u, 0x1F8Fu, 0x9CA4u, 0x926Eu, 0x1145u,
+ 0x24A7u, 0xA78Cu, 0xA946u, 0x2A6Du, 0xB4D2u, 0x37F9u, 0x3933u, 0xBA18u,
+ 0xC6B4u, 0x459Fu, 0x4B55u, 0xC87Eu, 0x56C1u, 0xD5EAu, 0xDB20u, 0x580Bu,
+ 0x6DE9u, 0xEEC2u, 0xE008u, 0x6323u, 0xFD9Cu, 0x7EB7u, 0x707Du, 0xF356u,
+ 0x1BB9u, 0x9892u, 0x9658u, 0x1573u, 0x8BCCu, 0x08E7u, 0x062Du, 0x8506u,
+ 0xB0E4u, 0x33CFu, 0x3D05u, 0xBE2Eu, 0x2091u, 0xA3BAu, 0xAD70u, 0x2E5Bu,
+ 0xF719u, 0x7432u, 0x7AF8u, 0xF9D3u, 0x676Cu, 0xE447u, 0xEA8Du, 0x69A6u,
+ 0x5C44u, 0xDF6Fu, 0xD1A5u, 0x528Eu, 0xCC31u, 0x4F1Au, 0x41D0u, 0xC2FBu,
+ 0x2A14u, 0xA93Fu, 0xA7F5u, 0x24DEu, 0xBA61u, 0x394Au, 0x3780u, 0xB4ABu,
+ 0x8149u, 0x0262u, 0x0CA8u, 0x8F83u, 0x113Cu, 0x9217u, 0x9CDDu, 0x1FF6u,
+ 0xA5EEu, 0x26C5u, 0x280Fu, 0xAB24u, 0x359Bu, 0xB6B0u, 0xB87Au, 0x3B51u,
+ 0x0EB3u, 0x8D98u, 0x8352u, 0x0079u, 0x9EC6u, 0x1DEDu, 0x1327u, 0x900Cu,
+ 0x78E3u, 0xFBC8u, 0xF502u, 0x7629u, 0xE896u, 0x6BBDu, 0x6577u, 0xE65Cu,
+ 0xD3BEu, 0x5095u, 0x5E5Fu, 0xDD74u, 0x43CBu, 0xC0E0u, 0xCE2Au, 0x4D01u,
+ 0x9443u, 0x1768u, 0x19A2u, 0x9A89u, 0x0436u, 0x871Du, 0x89D7u, 0x0AFCu,
+ 0x3F1Eu, 0xBC35u, 0xB2FFu, 0x31D4u, 0xAF6Bu, 0x2C40u, 0x228Au, 0xA1A1u,
+ 0x494Eu, 0xCA65u, 0xC4AFu, 0x4784u, 0xD93Bu, 0x5A10u, 0x54DAu, 0xD7F1u,
+ 0xE213u, 0x6138u, 0x6FF2u, 0xECD9u, 0x7266u, 0xF14Du, 0xFF87u, 0x7CACu
+ }
};

__u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len)
{
- unsigned int i;
+ const __u8 *i = (const __u8 *)buffer;
+ const __u8 *i_end = i + len;
+ const __u8 *i_last16 = i + (len / 16 * 16);
Please change the above variable names to something meaningful.

- for (i = 0 ; i < len ; i++)
- crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
+ for (; i < i_last16; i += 16) {
Initialize loop variable in the for loop if possible, it makes code readable.
+ crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^
+ t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^
Though in the original code it has complex array calculations, the new code involves another
arrary subscripts, it will be great if you can simplify the code.
+ t10_dif_crc_table[13][i[2]] ^
+ t10_dif_crc_table[12][i[3]] ^
+ t10_dif_crc_table[11][i[4]] ^
+ t10_dif_crc_table[10][i[5]] ^
+ t10_dif_crc_table[9][i[6]] ^
+ t10_dif_crc_table[8][i[7]] ^
+ t10_dif_crc_table[7][i[8]] ^
+ t10_dif_crc_table[6][i[9]] ^
+ t10_dif_crc_table[5][i[10]] ^
+ t10_dif_crc_table[4][i[11]] ^
+ t10_dif_crc_table[3][i[12]] ^
+ t10_dif_crc_table[2][i[13]] ^
+ t10_dif_crc_table[1][i[14]] ^
+ t10_dif_crc_table[0][i[15]];

Something doesn’t look right here. Given that above code seems to iterate through the entire array
t10_diff_crc_table[15…0][i[0…15]] why can’t we use another loop ? addition of another loop
with couple of switch cases will significantly simplify the code, use loop to calculate this quantity and
make code more readable, instead of hardcoding the loop 16 times.

Also do you have L1/L2/LLC I and D cache hit-miss statistics with new vs old code ?

+ }
+
+ for (; i < i_end; i++)
+ crc = t10_dif_crc_table[0][*i ^ (__u8)(crc >> 8)] ^ (crc << 8);

return crc;
}
--
1.8.3.1






2018-08-13 11:45:21

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] Performance Improvement in CRC16 Calculations.

From: Jeff Lien
> Sent: 10 August 2018 20:12
>
> This patch provides a performance improvement for the CRC16 calculations done in read/write
> workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
> workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
> bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
> folks from utilizing the throughput of such devices. To speed up this calculation and expose
> the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
> with a larger CRC table to match. The result has shown 5x performance improvements on various
> big endian and little endian systems running the 4.18.0 kernel version.
>
> FIO Sequential Write, 64K Block Size, Queue Depth 64
> BE Base Kernel: bw=201.5 MiB/s
> BE Modified CRC Calc: bw=968.1 MiB/s
> 4.80x performance improvement
>
> LE Base Kernel: bw=357 MiB/s
> LE Modified CRC Calc: bw=1964 MiB/s
> 5.51x performance improvement
>
> FIO Sequential Read, 64K Block Size, Queue Depth 64
> BE Base Kernel: bw=611.2 MiB/s
> BE Modified CRC calc: bw=684.9 MiB/s
> 1.12x performance improvement
>
> LE Base Kernel: bw=797 MiB/s
> LE Modified CRC Calc: bw=2730 MiB/s
> 3.42x performance improvement
>
> Reviewed-by: Dave Darrington <[email protected]>
> Reviewed-by: Jeff Furlong <[email protected]>
> Signed-off-by: Jeff Lien <[email protected]>
> ---
> crypto/crct10dif_common.c | 605 +++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 569 insertions(+), 36 deletions(-)
>
> diff --git a/crypto/crct10dif_common.c b/crypto/crct10dif_common.c
> index b2fab36..40e1d6c 100644
> --- a/crypto/crct10dif_common.c
> +++ b/crypto/crct10dif_common.c
> @@ -32,47 +32,580 @@
> * x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
> * gt: 0x8bb7
> */
> -static const __u16 t10_dif_crc_table[256] = {
...
> + }
> };
>
> __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len)
> {
> - unsigned int i;
> + const __u8 *i = (const __u8 *)buffer;
> + const __u8 *i_end = i + len;
> + const __u8 *i_last16 = i + (len / 16 * 16);
>
> - for (i = 0 ; i < len ; i++)
> - crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
> + for (; i < i_last16; i += 16) {
> + crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^
> + t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^
> + t10_dif_crc_table[13][i[2]] ^
> + t10_dif_crc_table[12][i[3]] ^
> + t10_dif_crc_table[11][i[4]] ^
> + t10_dif_crc_table[10][i[5]] ^
> + t10_dif_crc_table[9][i[6]] ^
> + t10_dif_crc_table[8][i[7]] ^
> + t10_dif_crc_table[7][i[8]] ^
> + t10_dif_crc_table[6][i[9]] ^
> + t10_dif_crc_table[5][i[10]] ^
> + t10_dif_crc_table[4][i[11]] ^
> + t10_dif_crc_table[3][i[12]] ^
> + t10_dif_crc_table[2][i[13]] ^
> + t10_dif_crc_table[1][i[14]] ^
> + t10_dif_crc_table[0][i[15]];
...

I suspect that all the gain comes from a slight relaxation of the
register dependency chain and the loop unrolling.

A more interesting version would be to generate the lookup table
for a byte followed by 3 zero bytes.
You could then run four separate register dependency chains using the
same 256 entry lookup table.
A little bit of work at the end of the buffer should sort it all out.

There is also the lookup table free version:
uint32_t
crc_step(uint32_t crc, uint32_t byte_val)
{
uint32_t t = crc ^ (byte_val & 0xff);
t = (t ^ t << 4) & 0xff;
return crc >> 8 ^ t << 8 ^ t << 3 ^ t >> 4;
}

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2018-08-13 13:50:50

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] Performance Improvement in CRC16 Calculations.

> A more interesting version would be to generate the lookup table
> for a byte followed by 3 zero bytes.
> You could then run four separate register dependency chains using the
> same 256 entry lookup table.

Not sure that works with a table lookup :-(

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2018-08-13 18:41:27

by Jeffrey Lien

[permalink] [raw]
Subject: RE: [PATCH] Performance Improvement in CRC16 Calculations.

Joe, Doug, Nicolas,
The CONFIG patch change suggested by Joe and Doug makes sense to do. I'll do some additional testing to verify the performance on my systems.


Jeff Lien

-----Original Message-----
From: Joe Perches [mailto:[email protected]]
Sent: Saturday, August 11, 2018 10:06 AM
To: [email protected]; Nicolas Pitre <[email protected]>
Cc: Jeffrey Lien <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; David Darrington <[email protected]>; Jeff Furlong <[email protected]>
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Sat, 2018-08-11 at 02:04 -0700, Joe Perches wrote:
> On Fri, 2018-08-10 at 22:39 -0400, Douglas Gilbert wrote:
> > but below is a copy and paste of a table 27 from draft SBC-4
> > revision 15 in chapter 4.22.4.4 on page 87.
>
> The posted code returns the proper crc for each
> CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE value from
> 1 to 5 for these arrays.

Jeff, could you please test the suggested patch with your comparison framework again with each CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE from 1 to 5?

I get on a very limited test framework here:
(runtime average of 10 runs)

1: 4.32
2: 1.86
3: 1.31
4: 1.05
5: 0.99

2018-08-13 22:44:56

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On 08/10/2018 12:12 PM, Jeff Lien wrote:
> This patch provides a performance improvement for the CRC16 calculations done in read/write
> workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
> workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
> bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
> folks from utilizing the throughput of such devices. To speed up this calculation and expose
> the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
> with a larger CRC table to match. The result has shown 5x performance improvements on various
> big endian and little endian systems running the 4.18.0 kernel version.
>
> FIO Sequential Write, 64K Block Size, Queue Depth 64
> BE Base Kernel: bw=201.5 MiB/s
> BE Modified CRC Calc: bw=968.1 MiB/s
> 4.80x performance improvement
>
> LE Base Kernel: bw=357 MiB/s
> LE Modified CRC Calc: bw=1964 MiB/s
> 5.51x performance improvement
>
> FIO Sequential Read, 64K Block Size, Queue Depth 64
> BE Base Kernel: bw=611.2 MiB/s
> BE Modified CRC calc: bw=684.9 MiB/s
> 1.12x performance improvement
>
> LE Base Kernel: bw=797 MiB/s
> LE Modified CRC Calc: bw=2730 MiB/s
> 3.42x performance improvement
>
> Reviewed-by: Dave Darrington <[email protected]>
> Reviewed-by: Jeff Furlong <[email protected]>
> Signed-off-by: Jeff Lien <[email protected]>
> ---
> crypto/crct10dif_common.c | 605 +++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 569 insertions(+), 36 deletions(-)
>
> diff --git a/crypto/crct10dif_common.c b/crypto/crct10dif_common.c
> index b2fab36..40e1d6c 100644
> --- a/crypto/crct10dif_common.c
> +++ b/crypto/crct10dif_common.c
> @@ -32,47 +32,580 @@
> * x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
> * gt: 0x8bb7
> */
> -static const __u16 t10_dif_crc_table[256] = {
> - 0x0000, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
> - 0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
> - 0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
> - 0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
> - 0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
> - 0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
> - 0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
> - 0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
> - 0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
> - 0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
> - 0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
> - 0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
> - 0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
> - 0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
> - 0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
> - 0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
> - 0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
> - 0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
> - 0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
> - 0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
> - 0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
> - 0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
> - 0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
> - 0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
> - 0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
> - 0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
> - 0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
> - 0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
> - 0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
> - 0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
> - 0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
> - 0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
> +static const __u16 t10_dif_crc_table[16][256] = {
> + {
> + 0x0000u, 0x8BB7u, 0x9CD9u, 0x176Eu, 0xB205u, 0x39B2u, 0x2EDCu, 0xA56Bu,
> + 0xEFBDu, 0x640Au, 0x7364u, 0xF8D3u, 0x5DB8u, 0xD60Fu, 0xC161u, 0x4AD6u,
> + 0x54CDu, 0xDF7Au, 0xC814u, 0x43A3u, 0xE6C8u, 0x6D7Fu, 0x7A11u, 0xF1A6u,
> + 0xBB70u, 0x30C7u, 0x27A9u, 0xAC1Eu, 0x0975u, 0x82C2u, 0x95ACu, 0x1E1Bu,
> + 0xA99Au, 0x222Du, 0x3543u, 0xBEF4u, 0x1B9Fu, 0x9028u, 0x8746u, 0x0CF1u,
> + 0x4627u, 0xCD90u, 0xDAFEu, 0x5149u, 0xF422u, 0x7F95u, 0x68FBu, 0xE34Cu,
> + 0xFD57u, 0x76E0u, 0x618Eu, 0xEA39u, 0x4F52u, 0xC4E5u, 0xD38Bu, 0x583Cu,
> + 0x12EAu, 0x995Du, 0x8E33u, 0x0584u, 0xA0EFu, 0x2B58u, 0x3C36u, 0xB781u,
> + 0xD883u, 0x5334u, 0x445Au, 0xCFEDu, 0x6A86u, 0xE131u, 0xF65Fu, 0x7DE8u,
> + 0x373Eu, 0xBC89u, 0xABE7u, 0x2050u, 0x853Bu, 0x0E8Cu, 0x19E2u, 0x9255u,
> + 0x8C4Eu, 0x07F9u, 0x1097u, 0x9B20u, 0x3E4Bu, 0xB5FCu, 0xA292u, 0x2925u,
> + 0x63F3u, 0xE844u, 0xFF2Au, 0x749Du, 0xD1F6u, 0x5A41u, 0x4D2Fu, 0xC698u,
> + 0x7119u, 0xFAAEu, 0xEDC0u, 0x6677u, 0xC31Cu, 0x48ABu, 0x5FC5u, 0xD472u,
> + 0x9EA4u, 0x1513u, 0x027Du, 0x89CAu, 0x2CA1u, 0xA716u, 0xB078u, 0x3BCFu,
> + 0x25D4u, 0xAE63u, 0xB90Du, 0x32BAu, 0x97D1u, 0x1C66u, 0x0B08u, 0x80BFu,
> + 0xCA69u, 0x41DEu, 0x56B0u, 0xDD07u, 0x786Cu, 0xF3DBu, 0xE4B5u, 0x6F02u,
> + 0x3AB1u, 0xB106u, 0xA668u, 0x2DDFu, 0x88B4u, 0x0303u, 0x146Du, 0x9FDAu,
> + 0xD50Cu, 0x5EBBu, 0x49D5u, 0xC262u, 0x6709u, 0xECBEu, 0xFBD0u, 0x7067u,
> + 0x6E7Cu, 0xE5CBu, 0xF2A5u, 0x7912u, 0xDC79u, 0x57CEu, 0x40A0u, 0xCB17u,
> + 0x81C1u, 0x0A76u, 0x1D18u, 0x96AFu, 0x33C4u, 0xB873u, 0xAF1Du, 0x24AAu,
> + 0x932Bu, 0x189Cu, 0x0FF2u, 0x8445u, 0x212Eu, 0xAA99u, 0xBDF7u, 0x3640u,
> + 0x7C96u, 0xF721u, 0xE04Fu, 0x6BF8u, 0xCE93u, 0x4524u, 0x524Au, 0xD9FDu,
> + 0xC7E6u, 0x4C51u, 0x5B3Fu, 0xD088u, 0x75E3u, 0xFE54u, 0xE93Au, 0x628Du,
> + 0x285Bu, 0xA3ECu, 0xB482u, 0x3F35u, 0x9A5Eu, 0x11E9u, 0x0687u, 0x8D30u,
> + 0xE232u, 0x6985u, 0x7EEBu, 0xF55Cu, 0x5037u, 0xDB80u, 0xCCEEu, 0x4759u,
> + 0x0D8Fu, 0x8638u, 0x9156u, 0x1AE1u, 0xBF8Au, 0x343Du, 0x2353u, 0xA8E4u,
> + 0xB6FFu, 0x3D48u, 0x2A26u, 0xA191u, 0x04FAu, 0x8F4Du, 0x9823u, 0x1394u,
> + 0x5942u, 0xD2F5u, 0xC59Bu, 0x4E2Cu, 0xEB47u, 0x60F0u, 0x779Eu, 0xFC29u,
> + 0x4BA8u, 0xC01Fu, 0xD771u, 0x5CC6u, 0xF9ADu, 0x721Au, 0x6574u, 0xEEC3u,
> + 0xA415u, 0x2FA2u, 0x38CCu, 0xB37Bu, 0x1610u, 0x9DA7u, 0x8AC9u, 0x017Eu,
> + 0x1F65u, 0x94D2u, 0x83BCu, 0x080Bu, 0xAD60u, 0x26D7u, 0x31B9u, 0xBA0Eu,
> + 0xF0D8u, 0x7B6Fu, 0x6C01u, 0xE7B6u, 0x42DDu, 0xC96Au, 0xDE04u, 0x55B3u
> + },
> + {
> + 0x0000u, 0x7562u, 0xEAC4u, 0x9FA6u, 0x5E3Fu, 0x2B5Du, 0xB4FBu, 0xC199u,
> + 0xBC7Eu, 0xC91Cu, 0x56BAu, 0x23D8u, 0xE241u, 0x9723u, 0x0885u, 0x7DE7u,
> + 0xF34Bu, 0x8629u, 0x198Fu, 0x6CEDu, 0xAD74u, 0xD816u, 0x47B0u, 0x32D2u,
> + 0x4F35u, 0x3A57u, 0xA5F1u, 0xD093u, 0x110Au, 0x6468u, 0xFBCEu, 0x8EACu,
> + 0x6D21u, 0x1843u, 0x87E5u, 0xF287u, 0x331Eu, 0x467Cu, 0xD9DAu, 0xACB8u,
> + 0xD15Fu, 0xA43Du, 0x3B9Bu, 0x4EF9u, 0x8F60u, 0xFA02u, 0x65A4u, 0x10C6u,
> + 0x9E6Au, 0xEB08u, 0x74AEu, 0x01CCu, 0xC055u, 0xB537u, 0x2A91u, 0x5FF3u,
> + 0x2214u, 0x5776u, 0xC8D0u, 0xBDB2u, 0x7C2Bu, 0x0949u, 0x96EFu, 0xE38Du,
> + 0xDA42u, 0xAF20u, 0x3086u, 0x45E4u, 0x847Du, 0xF11Fu, 0x6EB9u, 0x1BDBu,
> + 0x663Cu, 0x135Eu, 0x8CF8u, 0xF99Au, 0x3803u, 0x4D61u, 0xD2C7u, 0xA7A5u,
> + 0x2909u, 0x5C6Bu, 0xC3CDu, 0xB6AFu, 0x7736u, 0x0254u, 0x9DF2u, 0xE890u,
> + 0x9577u, 0xE015u, 0x7FB3u, 0x0AD1u, 0xCB48u, 0xBE2Au, 0x218Cu, 0x54EEu,
> + 0xB763u, 0xC201u, 0x5DA7u, 0x28C5u, 0xE95Cu, 0x9C3Eu, 0x0398u, 0x76FAu,
> + 0x0B1Du, 0x7E7Fu, 0xE1D9u, 0x94BBu, 0x5522u, 0x2040u, 0xBFE6u, 0xCA84u,
> + 0x4428u, 0x314Au, 0xAEECu, 0xDB8Eu, 0x1A17u, 0x6F75u, 0xF0D3u, 0x85B1u,
> + 0xF856u, 0x8D34u, 0x1292u, 0x67F0u, 0xA669u, 0xD30Bu, 0x4CADu, 0x39CFu,
> + 0x3F33u, 0x4A51u, 0xD5F7u, 0xA095u, 0x610Cu, 0x146Eu, 0x8BC8u, 0xFEAAu,
> + 0x834Du, 0xF62Fu, 0x6989u, 0x1CEBu, 0xDD72u, 0xA810u, 0x37B6u, 0x42D4u,
> + 0xCC78u, 0xB91Au, 0x26BCu, 0x53DEu, 0x9247u, 0xE725u, 0x7883u, 0x0DE1u,
> + 0x7006u, 0x0564u, 0x9AC2u, 0xEFA0u, 0x2E39u, 0x5B5Bu, 0xC4FDu, 0xB19Fu,
> + 0x5212u, 0x2770u, 0xB8D6u, 0xCDB4u, 0x0C2Du, 0x794Fu, 0xE6E9u, 0x938Bu,
> + 0xEE6Cu, 0x9B0Eu, 0x04A8u, 0x71CAu, 0xB053u, 0xC531u, 0x5A97u, 0x2FF5u,
> + 0xA159u, 0xD43Bu, 0x4B9Du, 0x3EFFu, 0xFF66u, 0x8A04u, 0x15A2u, 0x60C0u,
> + 0x1D27u, 0x6845u, 0xF7E3u, 0x8281u, 0x4318u, 0x367Au, 0xA9DCu, 0xDCBEu,
> + 0xE571u, 0x9013u, 0x0FB5u, 0x7AD7u, 0xBB4Eu, 0xCE2Cu, 0x518Au, 0x24E8u,
> + 0x590Fu, 0x2C6Du, 0xB3CBu, 0xC6A9u, 0x0730u, 0x7252u, 0xEDF4u, 0x9896u,
> + 0x163Au, 0x6358u, 0xFCFEu, 0x899Cu, 0x4805u, 0x3D67u, 0xA2C1u, 0xD7A3u,
> + 0xAA44u, 0xDF26u, 0x4080u, 0x35E2u, 0xF47Bu, 0x8119u, 0x1EBFu, 0x6BDDu,
> + 0x8850u, 0xFD32u, 0x6294u, 0x17F6u, 0xD66Fu, 0xA30Du, 0x3CABu, 0x49C9u,
> + 0x342Eu, 0x414Cu, 0xDEEAu, 0xAB88u, 0x6A11u, 0x1F73u, 0x80D5u, 0xF5B7u,
> + 0x7B1Bu, 0x0E79u, 0x91DFu, 0xE4BDu, 0x2524u, 0x5046u, 0xCFE0u, 0xBA82u,
> + 0xC765u, 0xB207u, 0x2DA1u, 0x58C3u, 0x995Au, 0xEC38u, 0x739Eu, 0x06FCu
> + },
> + {
> + 0x0000u, 0x7E66u, 0xFCCCu, 0x82AAu, 0x722Fu, 0x0C49u, 0x8EE3u, 0xF085u,
> + 0xE45Eu, 0x9A38u, 0x1892u, 0x66F4u, 0x9671u, 0xE817u, 0x6ABDu, 0x14DBu,
> + 0x430Bu, 0x3D6Du, 0xBFC7u, 0xC1A1u, 0x3124u, 0x4F42u, 0xCDE8u, 0xB38Eu,
> + 0xA755u, 0xD933u, 0x5B99u, 0x25FFu, 0xD57Au, 0xAB1Cu, 0x29B6u, 0x57D0u,
> + 0x8616u, 0xF870u, 0x7ADAu, 0x04BCu, 0xF439u, 0x8A5Fu, 0x08F5u, 0x7693u,
> + 0x6248u, 0x1C2Eu, 0x9E84u, 0xE0E2u, 0x1067u, 0x6E01u, 0xECABu, 0x92CDu,
> + 0xC51Du, 0xBB7Bu, 0x39D1u, 0x47B7u, 0xB732u, 0xC954u, 0x4BFEu, 0x3598u,
> + 0x2143u, 0x5F25u, 0xDD8Fu, 0xA3E9u, 0x536Cu, 0x2D0Au, 0xAFA0u, 0xD1C6u,
> + 0x879Bu, 0xF9FDu, 0x7B57u, 0x0531u, 0xF5B4u, 0x8BD2u, 0x0978u, 0x771Eu,
> + 0x63C5u, 0x1DA3u, 0x9F09u, 0xE16Fu, 0x11EAu, 0x6F8Cu, 0xED26u, 0x9340u,
> + 0xC490u, 0xBAF6u, 0x385Cu, 0x463Au, 0xB6BFu, 0xC8D9u, 0x4A73u, 0x3415u,
> + 0x20CEu, 0x5EA8u, 0xDC02u, 0xA264u, 0x52E1u, 0x2C87u, 0xAE2Du, 0xD04Bu,
> + 0x018Du, 0x7FEBu, 0xFD41u, 0x8327u, 0x73A2u, 0x0DC4u, 0x8F6Eu, 0xF108u,
> + 0xE5D3u, 0x9BB5u, 0x191Fu, 0x6779u, 0x97FCu, 0xE99Au, 0x6B30u, 0x1556u,
> + 0x4286u, 0x3CE0u, 0xBE4Au, 0xC02Cu, 0x30A9u, 0x4ECFu, 0xCC65u, 0xB203u,
> + 0xA6D8u, 0xD8BEu, 0x5A14u, 0x2472u, 0xD4F7u, 0xAA91u, 0x283Bu, 0x565Du,
> + 0x8481u, 0xFAE7u, 0x784Du, 0x062Bu, 0xF6AEu, 0x88C8u, 0x0A62u, 0x7404u,
> + 0x60DFu, 0x1EB9u, 0x9C13u, 0xE275u, 0x12F0u, 0x6C96u, 0xEE3Cu, 0x905Au,
> + 0xC78Au, 0xB9ECu, 0x3B46u, 0x4520u, 0xB5A5u, 0xCBC3u, 0x4969u, 0x370Fu,
> + 0x23D4u, 0x5DB2u, 0xDF18u, 0xA17Eu, 0x51FBu, 0x2F9Du, 0xAD37u, 0xD351u,
> + 0x0297u, 0x7CF1u, 0xFE5Bu, 0x803Du, 0x70B8u, 0x0EDEu, 0x8C74u, 0xF212u,
> + 0xE6C9u, 0x98AFu, 0x1A05u, 0x6463u, 0x94E6u, 0xEA80u, 0x682Au, 0x164Cu,
> + 0x419Cu, 0x3FFAu, 0xBD50u, 0xC336u, 0x33B3u, 0x4DD5u, 0xCF7Fu, 0xB119u,
> + 0xA5C2u, 0xDBA4u, 0x590Eu, 0x2768u, 0xD7EDu, 0xA98Bu, 0x2B21u, 0x5547u,
> + 0x031Au, 0x7D7Cu, 0xFFD6u, 0x81B0u, 0x7135u, 0x0F53u, 0x8DF9u, 0xF39Fu,
> + 0xE744u, 0x9922u, 0x1B88u, 0x65EEu, 0x956Bu, 0xEB0Du, 0x69A7u, 0x17C1u,
> + 0x4011u, 0x3E77u, 0xBCDDu, 0xC2BBu, 0x323Eu, 0x4C58u, 0xCEF2u, 0xB094u,
> + 0xA44Fu, 0xDA29u, 0x5883u, 0x26E5u, 0xD660u, 0xA806u, 0x2AACu, 0x54CAu,
> + 0x850Cu, 0xFB6Au, 0x79C0u, 0x07A6u, 0xF723u, 0x8945u, 0x0BEFu, 0x7589u,
> + 0x6152u, 0x1F34u, 0x9D9Eu, 0xE3F8u, 0x137Du, 0x6D1Bu, 0xEFB1u, 0x91D7u,
> + 0xC607u, 0xB861u, 0x3ACBu, 0x44ADu, 0xB428u, 0xCA4Eu, 0x48E4u, 0x3682u,
> + 0x2259u, 0x5C3Fu, 0xDE95u, 0xA0F3u, 0x5076u, 0x2E10u, 0xACBAu, 0xD2DCu
> + },
> + {
> + 0x0000u, 0x82B5u, 0x8EDDu, 0x0C68u, 0x960Du, 0x14B8u, 0x18D0u, 0x9A65u,
> + 0xA7ADu, 0x2518u, 0x2970u, 0xABC5u, 0x31A0u, 0xB315u, 0xBF7Du, 0x3DC8u,
> + 0xC4EDu, 0x4658u, 0x4A30u, 0xC885u, 0x52E0u, 0xD055u, 0xDC3Du, 0x5E88u,
> + 0x6340u, 0xE1F5u, 0xED9Du, 0x6F28u, 0xF54Du, 0x77F8u, 0x7B90u, 0xF925u,
> + 0x026Du, 0x80D8u, 0x8CB0u, 0x0E05u, 0x9460u, 0x16D5u, 0x1ABDu, 0x9808u,
> + 0xA5C0u, 0x2775u, 0x2B1Du, 0xA9A8u, 0x33CDu, 0xB178u, 0xBD10u, 0x3FA5u,
> + 0xC680u, 0x4435u, 0x485Du, 0xCAE8u, 0x508Du, 0xD238u, 0xDE50u, 0x5CE5u,
> + 0x612Du, 0xE398u, 0xEFF0u, 0x6D45u, 0xF720u, 0x7595u, 0x79FDu, 0xFB48u,
> + 0x04DAu, 0x866Fu, 0x8A07u, 0x08B2u, 0x92D7u, 0x1062u, 0x1C0Au, 0x9EBFu,
> + 0xA377u, 0x21C2u, 0x2DAAu, 0xAF1Fu, 0x357Au, 0xB7CFu, 0xBBA7u, 0x3912u,
> + 0xC037u, 0x4282u, 0x4EEAu, 0xCC5Fu, 0x563Au, 0xD48Fu, 0xD8E7u, 0x5A52u,
> + 0x679Au, 0xE52Fu, 0xE947u, 0x6BF2u, 0xF197u, 0x7322u, 0x7F4Au, 0xFDFFu,
> + 0x06B7u, 0x8402u, 0x886Au, 0x0ADFu, 0x90BAu, 0x120Fu, 0x1E67u, 0x9CD2u,
> + 0xA11Au, 0x23AFu, 0x2FC7u, 0xAD72u, 0x3717u, 0xB5A2u, 0xB9CAu, 0x3B7Fu,
> + 0xC25Au, 0x40EFu, 0x4C87u, 0xCE32u, 0x5457u, 0xD6E2u, 0xDA8Au, 0x583Fu,
> + 0x65F7u, 0xE742u, 0xEB2Au, 0x699Fu, 0xF3FAu, 0x714Fu, 0x7D27u, 0xFF92u,
> + 0x09B4u, 0x8B01u, 0x8769u, 0x05DCu, 0x9FB9u, 0x1D0Cu, 0x1164u, 0x93D1u,
> + 0xAE19u, 0x2CACu, 0x20C4u, 0xA271u, 0x3814u, 0xBAA1u, 0xB6C9u, 0x347Cu,
> + 0xCD59u, 0x4FECu, 0x4384u, 0xC131u, 0x5B54u, 0xD9E1u, 0xD589u, 0x573Cu,
> + 0x6AF4u, 0xE841u, 0xE429u, 0x669Cu, 0xFCF9u, 0x7E4Cu, 0x7224u, 0xF091u,
> + 0x0BD9u, 0x896Cu, 0x8504u, 0x07B1u, 0x9DD4u, 0x1F61u, 0x1309u, 0x91BCu,
> + 0xAC74u, 0x2EC1u, 0x22A9u, 0xA01Cu, 0x3A79u, 0xB8CCu, 0xB4A4u, 0x3611u,
> + 0xCF34u, 0x4D81u, 0x41E9u, 0xC35Cu, 0x5939u, 0xDB8Cu, 0xD7E4u, 0x5551u,
> + 0x6899u, 0xEA2Cu, 0xE644u, 0x64F1u, 0xFE94u, 0x7C21u, 0x7049u, 0xF2FCu,
> + 0x0D6Eu, 0x8FDBu, 0x83B3u, 0x0106u, 0x9B63u, 0x19D6u, 0x15BEu, 0x970Bu,
> + 0xAAC3u, 0x2876u, 0x241Eu, 0xA6ABu, 0x3CCEu, 0xBE7Bu, 0xB213u, 0x30A6u,
> + 0xC983u, 0x4B36u, 0x475Eu, 0xC5EBu, 0x5F8Eu, 0xDD3Bu, 0xD153u, 0x53E6u,
> + 0x6E2Eu, 0xEC9Bu, 0xE0F3u, 0x6246u, 0xF823u, 0x7A96u, 0x76FEu, 0xF44Bu,
> + 0x0F03u, 0x8DB6u, 0x81DEu, 0x036Bu, 0x990Eu, 0x1BBBu, 0x17D3u, 0x9566u,
> + 0xA8AEu, 0x2A1Bu, 0x2673u, 0xA4C6u, 0x3EA3u, 0xBC16u, 0xB07Eu, 0x32CBu,
> + 0xCBEEu, 0x495Bu, 0x4533u, 0xC786u, 0x5DE3u, 0xDF56u, 0xD33Eu, 0x518Bu,
> + 0x6C43u, 0xEEF6u, 0xE29Eu, 0x602Bu, 0xFA4Eu, 0x78FBu, 0x7493u, 0xF626u
> + },
> + {
> + 0x0000u, 0x1368u, 0x26D0u, 0x35B8u, 0x4DA0u, 0x5EC8u, 0x6B70u, 0x7818u,
> + 0x9B40u, 0x8828u, 0xBD90u, 0xAEF8u, 0xD6E0u, 0xC588u, 0xF030u, 0xE358u,
> + 0xBD37u, 0xAE5Fu, 0x9BE7u, 0x888Fu, 0xF097u, 0xE3FFu, 0xD647u, 0xC52Fu,
> + 0x2677u, 0x351Fu, 0x00A7u, 0x13CFu, 0x6BD7u, 0x78BFu, 0x4D07u, 0x5E6Fu,
> + 0xF1D9u, 0xE2B1u, 0xD709u, 0xC461u, 0xBC79u, 0xAF11u, 0x9AA9u, 0x89C1u,
> + 0x6A99u, 0x79F1u, 0x4C49u, 0x5F21u, 0x2739u, 0x3451u, 0x01E9u, 0x1281u,
> + 0x4CEEu, 0x5F86u, 0x6A3Eu, 0x7956u, 0x014Eu, 0x1226u, 0x279Eu, 0x34F6u,
> + 0xD7AEu, 0xC4C6u, 0xF17Eu, 0xE216u, 0x9A0Eu, 0x8966u, 0xBCDEu, 0xAFB6u,
> + 0x6805u, 0x7B6Du, 0x4ED5u, 0x5DBDu, 0x25A5u, 0x36CDu, 0x0375u, 0x101Du,
> + 0xF345u, 0xE02Du, 0xD595u, 0xC6FDu, 0xBEE5u, 0xAD8Du, 0x9835u, 0x8B5Du,
> + 0xD532u, 0xC65Au, 0xF3E2u, 0xE08Au, 0x9892u, 0x8BFAu, 0xBE42u, 0xAD2Au,
> + 0x4E72u, 0x5D1Au, 0x68A2u, 0x7BCAu, 0x03D2u, 0x10BAu, 0x2502u, 0x366Au,
> + 0x99DCu, 0x8AB4u, 0xBF0Cu, 0xAC64u, 0xD47Cu, 0xC714u, 0xF2ACu, 0xE1C4u,
> + 0x029Cu, 0x11F4u, 0x244Cu, 0x3724u, 0x4F3Cu, 0x5C54u, 0x69ECu, 0x7A84u,
> + 0x24EBu, 0x3783u, 0x023Bu, 0x1153u, 0x694Bu, 0x7A23u, 0x4F9Bu, 0x5CF3u,
> + 0xBFABu, 0xACC3u, 0x997Bu, 0x8A13u, 0xF20Bu, 0xE163u, 0xD4DBu, 0xC7B3u,
> + 0xD00Au, 0xC362u, 0xF6DAu, 0xE5B2u, 0x9DAAu, 0x8EC2u, 0xBB7Au, 0xA812u,
> + 0x4B4Au, 0x5822u, 0x6D9Au, 0x7EF2u, 0x06EAu, 0x1582u, 0x203Au, 0x3352u,
> + 0x6D3Du, 0x7E55u, 0x4BEDu, 0x5885u, 0x209Du, 0x33F5u, 0x064Du, 0x1525u,
> + 0xF67Du, 0xE515u, 0xD0ADu, 0xC3C5u, 0xBBDDu, 0xA8B5u, 0x9D0Du, 0x8E65u,
> + 0x21D3u, 0x32BBu, 0x0703u, 0x146Bu, 0x6C73u, 0x7F1Bu, 0x4AA3u, 0x59CBu,
> + 0xBA93u, 0xA9FBu, 0x9C43u, 0x8F2Bu, 0xF733u, 0xE45Bu, 0xD1E3u, 0xC28Bu,
> + 0x9CE4u, 0x8F8Cu, 0xBA34u, 0xA95Cu, 0xD144u, 0xC22Cu, 0xF794u, 0xE4FCu,
> + 0x07A4u, 0x14CCu, 0x2174u, 0x321Cu, 0x4A04u, 0x596Cu, 0x6CD4u, 0x7FBCu,
> + 0xB80Fu, 0xAB67u, 0x9EDFu, 0x8DB7u, 0xF5AFu, 0xE6C7u, 0xD37Fu, 0xC017u,
> + 0x234Fu, 0x3027u, 0x059Fu, 0x16F7u, 0x6EEFu, 0x7D87u, 0x483Fu, 0x5B57u,
> + 0x0538u, 0x1650u, 0x23E8u, 0x3080u, 0x4898u, 0x5BF0u, 0x6E48u, 0x7D20u,
> + 0x9E78u, 0x8D10u, 0xB8A8u, 0xABC0u, 0xD3D8u, 0xC0B0u, 0xF508u, 0xE660u,
> + 0x49D6u, 0x5ABEu, 0x6F06u, 0x7C6Eu, 0x0476u, 0x171Eu, 0x22A6u, 0x31CEu,
> + 0xD296u, 0xC1FEu, 0xF446u, 0xE72Eu, 0x9F36u, 0x8C5Eu, 0xB9E6u, 0xAA8Eu,
> + 0xF4E1u, 0xE789u, 0xD231u, 0xC159u, 0xB941u, 0xAA29u, 0x9F91u, 0x8CF9u,
> + 0x6FA1u, 0x7CC9u, 0x4971u, 0x5A19u, 0x2201u, 0x3169u, 0x04D1u, 0x17B9u
> + },
> + {
> + 0x0000u, 0x2BA3u, 0x5746u, 0x7CE5u, 0xAE8Cu, 0x852Fu, 0xF9CAu, 0xD269u,
> + 0xD6AFu, 0xFD0Cu, 0x81E9u, 0xAA4Au, 0x7823u, 0x5380u, 0x2F65u, 0x04C6u,
> + 0x26E9u, 0x0D4Au, 0x71AFu, 0x5A0Cu, 0x8865u, 0xA3C6u, 0xDF23u, 0xF480u,
> + 0xF046u, 0xDBE5u, 0xA700u, 0x8CA3u, 0x5ECAu, 0x7569u, 0x098Cu, 0x222Fu,
> + 0x4DD2u, 0x6671u, 0x1A94u, 0x3137u, 0xE35Eu, 0xC8FDu, 0xB418u, 0x9FBBu,
> + 0x9B7Du, 0xB0DEu, 0xCC3Bu, 0xE798u, 0x35F1u, 0x1E52u, 0x62B7u, 0x4914u,
> + 0x6B3Bu, 0x4098u, 0x3C7Du, 0x17DEu, 0xC5B7u, 0xEE14u, 0x92F1u, 0xB952u,
> + 0xBD94u, 0x9637u, 0xEAD2u, 0xC171u, 0x1318u, 0x38BBu, 0x445Eu, 0x6FFDu,
> + 0x9BA4u, 0xB007u, 0xCCE2u, 0xE741u, 0x3528u, 0x1E8Bu, 0x626Eu, 0x49CDu,
> + 0x4D0Bu, 0x66A8u, 0x1A4Du, 0x31EEu, 0xE387u, 0xC824u, 0xB4C1u, 0x9F62u,
> + 0xBD4Du, 0x96EEu, 0xEA0Bu, 0xC1A8u, 0x13C1u, 0x3862u, 0x4487u, 0x6F24u,
> + 0x6BE2u, 0x4041u, 0x3CA4u, 0x1707u, 0xC56Eu, 0xEECDu, 0x9228u, 0xB98Bu,
> + 0xD676u, 0xFDD5u, 0x8130u, 0xAA93u, 0x78FAu, 0x5359u, 0x2FBCu, 0x041Fu,
> + 0x00D9u, 0x2B7Au, 0x579Fu, 0x7C3Cu, 0xAE55u, 0x85F6u, 0xF913u, 0xD2B0u,
> + 0xF09Fu, 0xDB3Cu, 0xA7D9u, 0x8C7Au, 0x5E13u, 0x75B0u, 0x0955u, 0x22F6u,
> + 0x2630u, 0x0D93u, 0x7176u, 0x5AD5u, 0x88BCu, 0xA31Fu, 0xDFFAu, 0xF459u,
> + 0xBCFFu, 0x975Cu, 0xEBB9u, 0xC01Au, 0x1273u, 0x39D0u, 0x4535u, 0x6E96u,
> + 0x6A50u, 0x41F3u, 0x3D16u, 0x16B5u, 0xC4DCu, 0xEF7Fu, 0x939Au, 0xB839u,
> + 0x9A16u, 0xB1B5u, 0xCD50u, 0xE6F3u, 0x349Au, 0x1F39u, 0x63DCu, 0x487Fu,
> + 0x4CB9u, 0x671Au, 0x1BFFu, 0x305Cu, 0xE235u, 0xC996u, 0xB573u, 0x9ED0u,
> + 0xF12Du, 0xDA8Eu, 0xA66Bu, 0x8DC8u, 0x5FA1u, 0x7402u, 0x08E7u, 0x2344u,
> + 0x2782u, 0x0C21u, 0x70C4u, 0x5B67u, 0x890Eu, 0xA2ADu, 0xDE48u, 0xF5EBu,
> + 0xD7C4u, 0xFC67u, 0x8082u, 0xAB21u, 0x7948u, 0x52EBu, 0x2E0Eu, 0x05ADu,
> + 0x016Bu, 0x2AC8u, 0x562Du, 0x7D8Eu, 0xAFE7u, 0x8444u, 0xF8A1u, 0xD302u,
> + 0x275Bu, 0x0CF8u, 0x701Du, 0x5BBEu, 0x89D7u, 0xA274u, 0xDE91u, 0xF532u,
> + 0xF1F4u, 0xDA57u, 0xA6B2u, 0x8D11u, 0x5F78u, 0x74DBu, 0x083Eu, 0x239Du,
> + 0x01B2u, 0x2A11u, 0x56F4u, 0x7D57u, 0xAF3Eu, 0x849Du, 0xF878u, 0xD3DBu,
> + 0xD71Du, 0xFCBEu, 0x805Bu, 0xABF8u, 0x7991u, 0x5232u, 0x2ED7u, 0x0574u,
> + 0x6A89u, 0x412Au, 0x3DCFu, 0x166Cu, 0xC405u, 0xEFA6u, 0x9343u, 0xB8E0u,
> + 0xBC26u, 0x9785u, 0xEB60u, 0xC0C3u, 0x12AAu, 0x3909u, 0x45ECu, 0x6E4Fu,
> + 0x4C60u, 0x67C3u, 0x1B26u, 0x3085u, 0xE2ECu, 0xC94Fu, 0xB5AAu, 0x9E09u,
> + 0x9ACFu, 0xB16Cu, 0xCD89u, 0xE62Au, 0x3443u, 0x1FE0u, 0x6305u, 0x48A6u
> + },
> + {
> + 0x0000u, 0xF249u, 0x6F25u, 0x9D6Cu, 0xDE4Au, 0x2C03u, 0xB16Fu, 0x4326u,
> + 0x3723u, 0xC56Au, 0x5806u, 0xAA4Fu, 0xE969u, 0x1B20u, 0x864Cu, 0x7405u,
> + 0x6E46u, 0x9C0Fu, 0x0163u, 0xF32Au, 0xB00Cu, 0x4245u, 0xDF29u, 0x2D60u,
> + 0x5965u, 0xAB2Cu, 0x3640u, 0xC409u, 0x872Fu, 0x7566u, 0xE80Au, 0x1A43u,
> + 0xDC8Cu, 0x2EC5u, 0xB3A9u, 0x41E0u, 0x02C6u, 0xF08Fu, 0x6DE3u, 0x9FAAu,
> + 0xEBAFu, 0x19E6u, 0x848Au, 0x76C3u, 0x35E5u, 0xC7ACu, 0x5AC0u, 0xA889u,
> + 0xB2CAu, 0x4083u, 0xDDEFu, 0x2FA6u, 0x6C80u, 0x9EC9u, 0x03A5u, 0xF1ECu,
> + 0x85E9u, 0x77A0u, 0xEACCu, 0x1885u, 0x5BA3u, 0xA9EAu, 0x3486u, 0xC6CFu,
> + 0x32AFu, 0xC0E6u, 0x5D8Au, 0xAFC3u, 0xECE5u, 0x1EACu, 0x83C0u, 0x7189u,
> + 0x058Cu, 0xF7C5u, 0x6AA9u, 0x98E0u, 0xDBC6u, 0x298Fu, 0xB4E3u, 0x46AAu,
> + 0x5CE9u, 0xAEA0u, 0x33CCu, 0xC185u, 0x82A3u, 0x70EAu, 0xED86u, 0x1FCFu,
> + 0x6BCAu, 0x9983u, 0x04EFu, 0xF6A6u, 0xB580u, 0x47C9u, 0xDAA5u, 0x28ECu,
> + 0xEE23u, 0x1C6Au, 0x8106u, 0x734Fu, 0x3069u, 0xC220u, 0x5F4Cu, 0xAD05u,
> + 0xD900u, 0x2B49u, 0xB625u, 0x446Cu, 0x074Au, 0xF503u, 0x686Fu, 0x9A26u,
> + 0x8065u, 0x722Cu, 0xEF40u, 0x1D09u, 0x5E2Fu, 0xAC66u, 0x310Au, 0xC343u,
> + 0xB746u, 0x450Fu, 0xD863u, 0x2A2Au, 0x690Cu, 0x9B45u, 0x0629u, 0xF460u,
> + 0x655Eu, 0x9717u, 0x0A7Bu, 0xF832u, 0xBB14u, 0x495Du, 0xD431u, 0x2678u,
> + 0x527Du, 0xA034u, 0x3D58u, 0xCF11u, 0x8C37u, 0x7E7Eu, 0xE312u, 0x115Bu,
> + 0x0B18u, 0xF951u, 0x643Du, 0x9674u, 0xD552u, 0x271Bu, 0xBA77u, 0x483Eu,
> + 0x3C3Bu, 0xCE72u, 0x531Eu, 0xA157u, 0xE271u, 0x1038u, 0x8D54u, 0x7F1Du,
> + 0xB9D2u, 0x4B9Bu, 0xD6F7u, 0x24BEu, 0x6798u, 0x95D1u, 0x08BDu, 0xFAF4u,
> + 0x8EF1u, 0x7CB8u, 0xE1D4u, 0x139Du, 0x50BBu, 0xA2F2u, 0x3F9Eu, 0xCDD7u,
> + 0xD794u, 0x25DDu, 0xB8B1u, 0x4AF8u, 0x09DEu, 0xFB97u, 0x66FBu, 0x94B2u,
> + 0xE0B7u, 0x12FEu, 0x8F92u, 0x7DDBu, 0x3EFDu, 0xCCB4u, 0x51D8u, 0xA391u,
> + 0x57F1u, 0xA5B8u, 0x38D4u, 0xCA9Du, 0x89BBu, 0x7BF2u, 0xE69Eu, 0x14D7u,
> + 0x60D2u, 0x929Bu, 0x0FF7u, 0xFDBEu, 0xBE98u, 0x4CD1u, 0xD1BDu, 0x23F4u,
> + 0x39B7u, 0xCBFEu, 0x5692u, 0xA4DBu, 0xE7FDu, 0x15B4u, 0x88D8u, 0x7A91u,
> + 0x0E94u, 0xFCDDu, 0x61B1u, 0x93F8u, 0xD0DEu, 0x2297u, 0xBFFBu, 0x4DB2u,
> + 0x8B7Du, 0x7934u, 0xE458u, 0x1611u, 0x5537u, 0xA77Eu, 0x3A12u, 0xC85Bu,
> + 0xBC5Eu, 0x4E17u, 0xD37Bu, 0x2132u, 0x6214u, 0x905Du, 0x0D31u, 0xFF78u,
> + 0xE53Bu, 0x1772u, 0x8A1Eu, 0x7857u, 0x3B71u, 0xC938u, 0x5454u, 0xA61Du,
> + 0xD218u, 0x2051u, 0xBD3Du, 0x4F74u, 0x0C52u, 0xFE1Bu, 0x6377u, 0x913Eu
> + },
> + {
> + 0x0000u, 0xCABCu, 0x1ECFu, 0xD473u, 0x3D9Eu, 0xF722u, 0x2351u, 0xE9EDu,
> + 0x7B3Cu, 0xB180u, 0x65F3u, 0xAF4Fu, 0x46A2u, 0x8C1Eu, 0x586Du, 0x92D1u,
> + 0xF678u, 0x3CC4u, 0xE8B7u, 0x220Bu, 0xCBE6u, 0x015Au, 0xD529u, 0x1F95u,
> + 0x8D44u, 0x47F8u, 0x938Bu, 0x5937u, 0xB0DAu, 0x7A66u, 0xAE15u, 0x64A9u,
> + 0x6747u, 0xADFBu, 0x7988u, 0xB334u, 0x5AD9u, 0x9065u, 0x4416u, 0x8EAAu,
> + 0x1C7Bu, 0xD6C7u, 0x02B4u, 0xC808u, 0x21E5u, 0xEB59u, 0x3F2Au, 0xF596u,
> + 0x913Fu, 0x5B83u, 0x8FF0u, 0x454Cu, 0xACA1u, 0x661Du, 0xB26Eu, 0x78D2u,
> + 0xEA03u, 0x20BFu, 0xF4CCu, 0x3E70u, 0xD79Du, 0x1D21u, 0xC952u, 0x03EEu,
> + 0xCE8Eu, 0x0432u, 0xD041u, 0x1AFDu, 0xF310u, 0x39ACu, 0xEDDFu, 0x2763u,
> + 0xB5B2u, 0x7F0Eu, 0xAB7Du, 0x61C1u, 0x882Cu, 0x4290u, 0x96E3u, 0x5C5Fu,
> + 0x38F6u, 0xF24Au, 0x2639u, 0xEC85u, 0x0568u, 0xCFD4u, 0x1BA7u, 0xD11Bu,
> + 0x43CAu, 0x8976u, 0x5D05u, 0x97B9u, 0x7E54u, 0xB4E8u, 0x609Bu, 0xAA27u,
> + 0xA9C9u, 0x6375u, 0xB706u, 0x7DBAu, 0x9457u, 0x5EEBu, 0x8A98u, 0x4024u,
> + 0xD2F5u, 0x1849u, 0xCC3Au, 0x0686u, 0xEF6Bu, 0x25D7u, 0xF1A4u, 0x3B18u,
> + 0x5FB1u, 0x950Du, 0x417Eu, 0x8BC2u, 0x622Fu, 0xA893u, 0x7CE0u, 0xB65Cu,
> + 0x248Du, 0xEE31u, 0x3A42u, 0xF0FEu, 0x1913u, 0xD3AFu, 0x07DCu, 0xCD60u,
> + 0x16ABu, 0xDC17u, 0x0864u, 0xC2D8u, 0x2B35u, 0xE189u, 0x35FAu, 0xFF46u,
> + 0x6D97u, 0xA72Bu, 0x7358u, 0xB9E4u, 0x5009u, 0x9AB5u, 0x4EC6u, 0x847Au,
> + 0xE0D3u, 0x2A6Fu, 0xFE1Cu, 0x34A0u, 0xDD4Du, 0x17F1u, 0xC382u, 0x093Eu,
> + 0x9BEFu, 0x5153u, 0x8520u, 0x4F9Cu, 0xA671u, 0x6CCDu, 0xB8BEu, 0x7202u,
> + 0x71ECu, 0xBB50u, 0x6F23u, 0xA59Fu, 0x4C72u, 0x86CEu, 0x52BDu, 0x9801u,
> + 0x0AD0u, 0xC06Cu, 0x141Fu, 0xDEA3u, 0x374Eu, 0xFDF2u, 0x2981u, 0xE33Du,
> + 0x8794u, 0x4D28u, 0x995Bu, 0x53E7u, 0xBA0Au, 0x70B6u, 0xA4C5u, 0x6E79u,
> + 0xFCA8u, 0x3614u, 0xE267u, 0x28DBu, 0xC136u, 0x0B8Au, 0xDFF9u, 0x1545u,
> + 0xD825u, 0x1299u, 0xC6EAu, 0x0C56u, 0xE5BBu, 0x2F07u, 0xFB74u, 0x31C8u,
> + 0xA319u, 0x69A5u, 0xBDD6u, 0x776Au, 0x9E87u, 0x543Bu, 0x8048u, 0x4AF4u,
> + 0x2E5Du, 0xE4E1u, 0x3092u, 0xFA2Eu, 0x13C3u, 0xD97Fu, 0x0D0Cu, 0xC7B0u,
> + 0x5561u, 0x9FDDu, 0x4BAEu, 0x8112u, 0x68FFu, 0xA243u, 0x7630u, 0xBC8Cu,
> + 0xBF62u, 0x75DEu, 0xA1ADu, 0x6B11u, 0x82FCu, 0x4840u, 0x9C33u, 0x568Fu,
> + 0xC45Eu, 0x0EE2u, 0xDA91u, 0x102Du, 0xF9C0u, 0x337Cu, 0xE70Fu, 0x2DB3u,
> + 0x491Au, 0x83A6u, 0x57D5u, 0x9D69u, 0x7484u, 0xBE38u, 0x6A4Bu, 0xA0F7u,
> + 0x3226u, 0xF89Au, 0x2CE9u, 0xE655u, 0x0FB8u, 0xC504u, 0x1177u, 0xDBCBu
> + },
> + {
> + 0x0000u, 0x2D56u, 0x5AACu, 0x77FAu, 0xB558u, 0x980Eu, 0xEFF4u, 0xC2A2u,
> + 0xE107u, 0xCC51u, 0xBBABu, 0x96FDu, 0x545Fu, 0x7909u, 0x0EF3u, 0x23A5u,
> + 0x49B9u, 0x64EFu, 0x1315u, 0x3E43u, 0xFCE1u, 0xD1B7u, 0xA64Du, 0x8B1Bu,
> + 0xA8BEu, 0x85E8u, 0xF212u, 0xDF44u, 0x1DE6u, 0x30B0u, 0x474Au, 0x6A1Cu,
> + 0x9372u, 0xBE24u, 0xC9DEu, 0xE488u, 0x262Au, 0x0B7Cu, 0x7C86u, 0x51D0u,
> + 0x7275u, 0x5F23u, 0x28D9u, 0x058Fu, 0xC72Du, 0xEA7Bu, 0x9D81u, 0xB0D7u,
> + 0xDACBu, 0xF79Du, 0x8067u, 0xAD31u, 0x6F93u, 0x42C5u, 0x353Fu, 0x1869u,
> + 0x3BCCu, 0x169Au, 0x6160u, 0x4C36u, 0x8E94u, 0xA3C2u, 0xD438u, 0xF96Eu,
> + 0xAD53u, 0x8005u, 0xF7FFu, 0xDAA9u, 0x180Bu, 0x355Du, 0x42A7u, 0x6FF1u,
> + 0x4C54u, 0x6102u, 0x16F8u, 0x3BAEu, 0xF90Cu, 0xD45Au, 0xA3A0u, 0x8EF6u,
> + 0xE4EAu, 0xC9BCu, 0xBE46u, 0x9310u, 0x51B2u, 0x7CE4u, 0x0B1Eu, 0x2648u,
> + 0x05EDu, 0x28BBu, 0x5F41u, 0x7217u, 0xB0B5u, 0x9DE3u, 0xEA19u, 0xC74Fu,
> + 0x3E21u, 0x1377u, 0x648Du, 0x49DBu, 0x8B79u, 0xA62Fu, 0xD1D5u, 0xFC83u,
> + 0xDF26u, 0xF270u, 0x858Au, 0xA8DCu, 0x6A7Eu, 0x4728u, 0x30D2u, 0x1D84u,
> + 0x7798u, 0x5ACEu, 0x2D34u, 0x0062u, 0xC2C0u, 0xEF96u, 0x986Cu, 0xB53Au,
> + 0x969Fu, 0xBBC9u, 0xCC33u, 0xE165u, 0x23C7u, 0x0E91u, 0x796Bu, 0x543Du,
> + 0xD111u, 0xFC47u, 0x8BBDu, 0xA6EBu, 0x6449u, 0x491Fu, 0x3EE5u, 0x13B3u,
> + 0x3016u, 0x1D40u, 0x6ABAu, 0x47ECu, 0x854Eu, 0xA818u, 0xDFE2u, 0xF2B4u,
> + 0x98A8u, 0xB5FEu, 0xC204u, 0xEF52u, 0x2DF0u, 0x00A6u, 0x775Cu, 0x5A0Au,
> + 0x79AFu, 0x54F9u, 0x2303u, 0x0E55u, 0xCCF7u, 0xE1A1u, 0x965Bu, 0xBB0Du,
> + 0x4263u, 0x6F35u, 0x18CFu, 0x3599u, 0xF73Bu, 0xDA6Du, 0xAD97u, 0x80C1u,
> + 0xA364u, 0x8E32u, 0xF9C8u, 0xD49Eu, 0x163Cu, 0x3B6Au, 0x4C90u, 0x61C6u,
> + 0x0BDAu, 0x268Cu, 0x5176u, 0x7C20u, 0xBE82u, 0x93D4u, 0xE42Eu, 0xC978u,
> + 0xEADDu, 0xC78Bu, 0xB071u, 0x9D27u, 0x5F85u, 0x72D3u, 0x0529u, 0x287Fu,
> + 0x7C42u, 0x5114u, 0x26EEu, 0x0BB8u, 0xC91Au, 0xE44Cu, 0x93B6u, 0xBEE0u,
> + 0x9D45u, 0xB013u, 0xC7E9u, 0xEABFu, 0x281Du, 0x054Bu, 0x72B1u, 0x5FE7u,
> + 0x35FBu, 0x18ADu, 0x6F57u, 0x4201u, 0x80A3u, 0xADF5u, 0xDA0Fu, 0xF759u,
> + 0xD4FCu, 0xF9AAu, 0x8E50u, 0xA306u, 0x61A4u, 0x4CF2u, 0x3B08u, 0x165Eu,
> + 0xEF30u, 0xC266u, 0xB59Cu, 0x98CAu, 0x5A68u, 0x773Eu, 0x00C4u, 0x2D92u,
> + 0x0E37u, 0x2361u, 0x549Bu, 0x79CDu, 0xBB6Fu, 0x9639u, 0xE1C3u, 0xCC95u,
> + 0xA689u, 0x8BDFu, 0xFC25u, 0xD173u, 0x13D1u, 0x3E87u, 0x497Du, 0x642Bu,
> + 0x478Eu, 0x6AD8u, 0x1D22u, 0x3074u, 0xF2D6u, 0xDF80u, 0xA87Au, 0x852Cu
> + },
> + {
> + 0x0000u, 0x2995u, 0x532Au, 0x7ABFu, 0xA654u, 0x8FC1u, 0xF57Eu, 0xDCEBu,
> + 0xC71Fu, 0xEE8Au, 0x9435u, 0xBDA0u, 0x614Bu, 0x48DEu, 0x3261u, 0x1BF4u,
> + 0x0589u, 0x2C1Cu, 0x56A3u, 0x7F36u, 0xA3DDu, 0x8A48u, 0xF0F7u, 0xD962u,
> + 0xC296u, 0xEB03u, 0x91BCu, 0xB829u, 0x64C2u, 0x4D57u, 0x37E8u, 0x1E7Du,
> + 0x0B12u, 0x2287u, 0x5838u, 0x71ADu, 0xAD46u, 0x84D3u, 0xFE6Cu, 0xD7F9u,
> + 0xCC0Du, 0xE598u, 0x9F27u, 0xB6B2u, 0x6A59u, 0x43CCu, 0x3973u, 0x10E6u,
> + 0x0E9Bu, 0x270Eu, 0x5DB1u, 0x7424u, 0xA8CFu, 0x815Au, 0xFBE5u, 0xD270u,
> + 0xC984u, 0xE011u, 0x9AAEu, 0xB33Bu, 0x6FD0u, 0x4645u, 0x3CFAu, 0x156Fu,
> + 0x1624u, 0x3FB1u, 0x450Eu, 0x6C9Bu, 0xB070u, 0x99E5u, 0xE35Au, 0xCACFu,
> + 0xD13Bu, 0xF8AEu, 0x8211u, 0xAB84u, 0x776Fu, 0x5EFAu, 0x2445u, 0x0DD0u,
> + 0x13ADu, 0x3A38u, 0x4087u, 0x6912u, 0xB5F9u, 0x9C6Cu, 0xE6D3u, 0xCF46u,
> + 0xD4B2u, 0xFD27u, 0x8798u, 0xAE0Du, 0x72E6u, 0x5B73u, 0x21CCu, 0x0859u,
> + 0x1D36u, 0x34A3u, 0x4E1Cu, 0x6789u, 0xBB62u, 0x92F7u, 0xE848u, 0xC1DDu,
> + 0xDA29u, 0xF3BCu, 0x8903u, 0xA096u, 0x7C7Du, 0x55E8u, 0x2F57u, 0x06C2u,
> + 0x18BFu, 0x312Au, 0x4B95u, 0x6200u, 0xBEEBu, 0x977Eu, 0xEDC1u, 0xC454u,
> + 0xDFA0u, 0xF635u, 0x8C8Au, 0xA51Fu, 0x79F4u, 0x5061u, 0x2ADEu, 0x034Bu,
> + 0x2C48u, 0x05DDu, 0x7F62u, 0x56F7u, 0x8A1Cu, 0xA389u, 0xD936u, 0xF0A3u,
> + 0xEB57u, 0xC2C2u, 0xB87Du, 0x91E8u, 0x4D03u, 0x6496u, 0x1E29u, 0x37BCu,
> + 0x29C1u, 0x0054u, 0x7AEBu, 0x537Eu, 0x8F95u, 0xA600u, 0xDCBFu, 0xF52Au,
> + 0xEEDEu, 0xC74Bu, 0xBDF4u, 0x9461u, 0x488Au, 0x611Fu, 0x1BA0u, 0x3235u,
> + 0x275Au, 0x0ECFu, 0x7470u, 0x5DE5u, 0x810Eu, 0xA89Bu, 0xD224u, 0xFBB1u,
> + 0xE045u, 0xC9D0u, 0xB36Fu, 0x9AFAu, 0x4611u, 0x6F84u, 0x153Bu, 0x3CAEu,
> + 0x22D3u, 0x0B46u, 0x71F9u, 0x586Cu, 0x8487u, 0xAD12u, 0xD7ADu, 0xFE38u,
> + 0xE5CCu, 0xCC59u, 0xB6E6u, 0x9F73u, 0x4398u, 0x6A0Du, 0x10B2u, 0x3927u,
> + 0x3A6Cu, 0x13F9u, 0x6946u, 0x40D3u, 0x9C38u, 0xB5ADu, 0xCF12u, 0xE687u,
> + 0xFD73u, 0xD4E6u, 0xAE59u, 0x87CCu, 0x5B27u, 0x72B2u, 0x080Du, 0x2198u,
> + 0x3FE5u, 0x1670u, 0x6CCFu, 0x455Au, 0x99B1u, 0xB024u, 0xCA9Bu, 0xE30Eu,
> + 0xF8FAu, 0xD16Fu, 0xABD0u, 0x8245u, 0x5EAEu, 0x773Bu, 0x0D84u, 0x2411u,
> + 0x317Eu, 0x18EBu, 0x6254u, 0x4BC1u, 0x972Au, 0xBEBFu, 0xC400u, 0xED95u,
> + 0xF661u, 0xDFF4u, 0xA54Bu, 0x8CDEu, 0x5035u, 0x79A0u, 0x031Fu, 0x2A8Au,
> + 0x34F7u, 0x1D62u, 0x67DDu, 0x4E48u, 0x92A3u, 0xBB36u, 0xC189u, 0xE81Cu,
> + 0xF3E8u, 0xDA7Du, 0xA0C2u, 0x8957u, 0x55BCu, 0x7C29u, 0x0696u, 0x2F03u
> + },
> + {
> + 0x0000u, 0x5890u, 0xB120u, 0xE9B0u, 0xE9F7u, 0xB167u, 0x58D7u, 0x0047u,
> + 0x5859u, 0x00C9u, 0xE979u, 0xB1E9u, 0xB1AEu, 0xE93Eu, 0x008Eu, 0x581Eu,
> + 0xB0B2u, 0xE822u, 0x0192u, 0x5902u, 0x5945u, 0x01D5u, 0xE865u, 0xB0F5u,
> + 0xE8EBu, 0xB07Bu, 0x59CBu, 0x015Bu, 0x011Cu, 0x598Cu, 0xB03Cu, 0xE8ACu,
> + 0xEAD3u, 0xB243u, 0x5BF3u, 0x0363u, 0x0324u, 0x5BB4u, 0xB204u, 0xEA94u,
> + 0xB28Au, 0xEA1Au, 0x03AAu, 0x5B3Au, 0x5B7Du, 0x03EDu, 0xEA5Du, 0xB2CDu,
> + 0x5A61u, 0x02F1u, 0xEB41u, 0xB3D1u, 0xB396u, 0xEB06u, 0x02B6u, 0x5A26u,
> + 0x0238u, 0x5AA8u, 0xB318u, 0xEB88u, 0xEBCFu, 0xB35Fu, 0x5AEFu, 0x027Fu,
> + 0x5E11u, 0x0681u, 0xEF31u, 0xB7A1u, 0xB7E6u, 0xEF76u, 0x06C6u, 0x5E56u,
> + 0x0648u, 0x5ED8u, 0xB768u, 0xEFF8u, 0xEFBFu, 0xB72Fu, 0x5E9Fu, 0x060Fu,
> + 0xEEA3u, 0xB633u, 0x5F83u, 0x0713u, 0x0754u, 0x5FC4u, 0xB674u, 0xEEE4u,
> + 0xB6FAu, 0xEE6Au, 0x07DAu, 0x5F4Au, 0x5F0Du, 0x079Du, 0xEE2Du, 0xB6BDu,
> + 0xB4C2u, 0xEC52u, 0x05E2u, 0x5D72u, 0x5D35u, 0x05A5u, 0xEC15u, 0xB485u,
> + 0xEC9Bu, 0xB40Bu, 0x5DBBu, 0x052Bu, 0x056Cu, 0x5DFCu, 0xB44Cu, 0xECDCu,
> + 0x0470u, 0x5CE0u, 0xB550u, 0xEDC0u, 0xED87u, 0xB517u, 0x5CA7u, 0x0437u,
> + 0x5C29u, 0x04B9u, 0xED09u, 0xB599u, 0xB5DEu, 0xED4Eu, 0x04FEu, 0x5C6Eu,
> + 0xBC22u, 0xE4B2u, 0x0D02u, 0x5592u, 0x55D5u, 0x0D45u, 0xE4F5u, 0xBC65u,
> + 0xE47Bu, 0xBCEBu, 0x555Bu, 0x0DCBu, 0x0D8Cu, 0x551Cu, 0xBCACu, 0xE43Cu,
> + 0x0C90u, 0x5400u, 0xBDB0u, 0xE520u, 0xE567u, 0xBDF7u, 0x5447u, 0x0CD7u,
> + 0x54C9u, 0x0C59u, 0xE5E9u, 0xBD79u, 0xBD3Eu, 0xE5AEu, 0x0C1Eu, 0x548Eu,
> + 0x56F1u, 0x0E61u, 0xE7D1u, 0xBF41u, 0xBF06u, 0xE796u, 0x0E26u, 0x56B6u,
> + 0x0EA8u, 0x5638u, 0xBF88u, 0xE718u, 0xE75Fu, 0xBFCFu, 0x567Fu, 0x0EEFu,
> + 0xE643u, 0xBED3u, 0x5763u, 0x0FF3u, 0x0FB4u, 0x5724u, 0xBE94u, 0xE604u,
> + 0xBE1Au, 0xE68Au, 0x0F3Au, 0x57AAu, 0x57EDu, 0x0F7Du, 0xE6CDu, 0xBE5Du,
> + 0xE233u, 0xBAA3u, 0x5313u, 0x0B83u, 0x0BC4u, 0x5354u, 0xBAE4u, 0xE274u,
> + 0xBA6Au, 0xE2FAu, 0x0B4Au, 0x53DAu, 0x539Du, 0x0B0Du, 0xE2BDu, 0xBA2Du,
> + 0x5281u, 0x0A11u, 0xE3A1u, 0xBB31u, 0xBB76u, 0xE3E6u, 0x0A56u, 0x52C6u,
> + 0x0AD8u, 0x5248u, 0xBBF8u, 0xE368u, 0xE32Fu, 0xBBBFu, 0x520Fu, 0x0A9Fu,
> + 0x08E0u, 0x5070u, 0xB9C0u, 0xE150u, 0xE117u, 0xB987u, 0x5037u, 0x08A7u,
> + 0x50B9u, 0x0829u, 0xE199u, 0xB909u, 0xB94Eu, 0xE1DEu, 0x086Eu, 0x50FEu,
> + 0xB852u, 0xE0C2u, 0x0972u, 0x51E2u, 0x51A5u, 0x0935u, 0xE085u, 0xB815u,
> + 0xE00Bu, 0xB89Bu, 0x512Bu, 0x09BBu, 0x09FCu, 0x516Cu, 0xB8DCu, 0xE04Cu
> + },
> + {
> + 0x0000u, 0xF3F3u, 0x6C51u, 0x9FA2u, 0xD8A2u, 0x2B51u, 0xB4F3u, 0x4700u,
> + 0x3AF3u, 0xC900u, 0x56A2u, 0xA551u, 0xE251u, 0x11A2u, 0x8E00u, 0x7DF3u,
> + 0x75E6u, 0x8615u, 0x19B7u, 0xEA44u, 0xAD44u, 0x5EB7u, 0xC115u, 0x32E6u,
> + 0x4F15u, 0xBCE6u, 0x2344u, 0xD0B7u, 0x97B7u, 0x6444u, 0xFBE6u, 0x0815u,
> + 0xEBCCu, 0x183Fu, 0x879Du, 0x746Eu, 0x336Eu, 0xC09Du, 0x5F3Fu, 0xACCCu,
> + 0xD13Fu, 0x22CCu, 0xBD6Eu, 0x4E9Du, 0x099Du, 0xFA6Eu, 0x65CCu, 0x963Fu,
> + 0x9E2Au, 0x6DD9u, 0xF27Bu, 0x0188u, 0x4688u, 0xB57Bu, 0x2AD9u, 0xD92Au,
> + 0xA4D9u, 0x572Au, 0xC888u, 0x3B7Bu, 0x7C7Bu, 0x8F88u, 0x102Au, 0xE3D9u,
> + 0x5C2Fu, 0xAFDCu, 0x307Eu, 0xC38Du, 0x848Du, 0x777Eu, 0xE8DCu, 0x1B2Fu,
> + 0x66DCu, 0x952Fu, 0x0A8Du, 0xF97Eu, 0xBE7Eu, 0x4D8Du, 0xD22Fu, 0x21DCu,
> + 0x29C9u, 0xDA3Au, 0x4598u, 0xB66Bu, 0xF16Bu, 0x0298u, 0x9D3Au, 0x6EC9u,
> + 0x133Au, 0xE0C9u, 0x7F6Bu, 0x8C98u, 0xCB98u, 0x386Bu, 0xA7C9u, 0x543Au,
> + 0xB7E3u, 0x4410u, 0xDBB2u, 0x2841u, 0x6F41u, 0x9CB2u, 0x0310u, 0xF0E3u,
> + 0x8D10u, 0x7EE3u, 0xE141u, 0x12B2u, 0x55B2u, 0xA641u, 0x39E3u, 0xCA10u,
> + 0xC205u, 0x31F6u, 0xAE54u, 0x5DA7u, 0x1AA7u, 0xE954u, 0x76F6u, 0x8505u,
> + 0xF8F6u, 0x0B05u, 0x94A7u, 0x6754u, 0x2054u, 0xD3A7u, 0x4C05u, 0xBFF6u,
> + 0xB85Eu, 0x4BADu, 0xD40Fu, 0x27FCu, 0x60FCu, 0x930Fu, 0x0CADu, 0xFF5Eu,
> + 0x82ADu, 0x715Eu, 0xEEFCu, 0x1D0Fu, 0x5A0Fu, 0xA9FCu, 0x365Eu, 0xC5ADu,
> + 0xCDB8u, 0x3E4Bu, 0xA1E9u, 0x521Au, 0x151Au, 0xE6E9u, 0x794Bu, 0x8AB8u,
> + 0xF74Bu, 0x04B8u, 0x9B1Au, 0x68E9u, 0x2FE9u, 0xDC1Au, 0x43B8u, 0xB04Bu,
> + 0x5392u, 0xA061u, 0x3FC3u, 0xCC30u, 0x8B30u, 0x78C3u, 0xE761u, 0x1492u,
> + 0x6961u, 0x9A92u, 0x0530u, 0xF6C3u, 0xB1C3u, 0x4230u, 0xDD92u, 0x2E61u,
> + 0x2674u, 0xD587u, 0x4A25u, 0xB9D6u, 0xFED6u, 0x0D25u, 0x9287u, 0x6174u,
> + 0x1C87u, 0xEF74u, 0x70D6u, 0x8325u, 0xC425u, 0x37D6u, 0xA874u, 0x5B87u,
> + 0xE471u, 0x1782u, 0x8820u, 0x7BD3u, 0x3CD3u, 0xCF20u, 0x5082u, 0xA371u,
> + 0xDE82u, 0x2D71u, 0xB2D3u, 0x4120u, 0x0620u, 0xF5D3u, 0x6A71u, 0x9982u,
> + 0x9197u, 0x6264u, 0xFDC6u, 0x0E35u, 0x4935u, 0xBAC6u, 0x2564u, 0xD697u,
> + 0xAB64u, 0x5897u, 0xC735u, 0x34C6u, 0x73C6u, 0x8035u, 0x1F97u, 0xEC64u,
> + 0x0FBDu, 0xFC4Eu, 0x63ECu, 0x901Fu, 0xD71Fu, 0x24ECu, 0xBB4Eu, 0x48BDu,
> + 0x354Eu, 0xC6BDu, 0x591Fu, 0xAAECu, 0xEDECu, 0x1E1Fu, 0x81BDu, 0x724Eu,
> + 0x7A5Bu, 0x89A8u, 0x160Au, 0xE5F9u, 0xA2F9u, 0x510Au, 0xCEA8u, 0x3D5Bu,
> + 0x40A8u, 0xB35Bu, 0x2CF9u, 0xDF0Au, 0x980Au, 0x6BF9u, 0xF45Bu, 0x07A8u
> + },
> + {
> + 0x0000u, 0xFB0Bu, 0x7DA1u, 0x86AAu, 0xFB42u, 0x0049u, 0x86E3u, 0x7DE8u,
> + 0x7D33u, 0x8638u, 0x0092u, 0xFB99u, 0x8671u, 0x7D7Au, 0xFBD0u, 0x00DBu,
> + 0xFA66u, 0x016Du, 0x87C7u, 0x7CCCu, 0x0124u, 0xFA2Fu, 0x7C85u, 0x878Eu,
> + 0x8755u, 0x7C5Eu, 0xFAF4u, 0x01FFu, 0x7C17u, 0x871Cu, 0x01B6u, 0xFABDu,
> + 0x7F7Bu, 0x8470u, 0x02DAu, 0xF9D1u, 0x8439u, 0x7F32u, 0xF998u, 0x0293u,
> + 0x0248u, 0xF943u, 0x7FE9u, 0x84E2u, 0xF90Au, 0x0201u, 0x84ABu, 0x7FA0u,
> + 0x851Du, 0x7E16u, 0xF8BCu, 0x03B7u, 0x7E5Fu, 0x8554u, 0x03FEu, 0xF8F5u,
> + 0xF82Eu, 0x0325u, 0x858Fu, 0x7E84u, 0x036Cu, 0xF867u, 0x7ECDu, 0x85C6u,
> + 0xFEF6u, 0x05FDu, 0x8357u, 0x785Cu, 0x05B4u, 0xFEBFu, 0x7815u, 0x831Eu,
> + 0x83C5u, 0x78CEu, 0xFE64u, 0x056Fu, 0x7887u, 0x838Cu, 0x0526u, 0xFE2Du,
> + 0x0490u, 0xFF9Bu, 0x7931u, 0x823Au, 0xFFD2u, 0x04D9u, 0x8273u, 0x7978u,
> + 0x79A3u, 0x82A8u, 0x0402u, 0xFF09u, 0x82E1u, 0x79EAu, 0xFF40u, 0x044Bu,
> + 0x818Du, 0x7A86u, 0xFC2Cu, 0x0727u, 0x7ACFu, 0x81C4u, 0x076Eu, 0xFC65u,
> + 0xFCBEu, 0x07B5u, 0x811Fu, 0x7A14u, 0x07FCu, 0xFCF7u, 0x7A5Du, 0x8156u,
> + 0x7BEBu, 0x80E0u, 0x064Au, 0xFD41u, 0x80A9u, 0x7BA2u, 0xFD08u, 0x0603u,
> + 0x06D8u, 0xFDD3u, 0x7B79u, 0x8072u, 0xFD9Au, 0x0691u, 0x803Bu, 0x7B30u,
> + 0x765Bu, 0x8D50u, 0x0BFAu, 0xF0F1u, 0x8D19u, 0x7612u, 0xF0B8u, 0x0BB3u,
> + 0x0B68u, 0xF063u, 0x76C9u, 0x8DC2u, 0xF02Au, 0x0B21u, 0x8D8Bu, 0x7680u,
> + 0x8C3Du, 0x7736u, 0xF19Cu, 0x0A97u, 0x777Fu, 0x8C74u, 0x0ADEu, 0xF1D5u,
> + 0xF10Eu, 0x0A05u, 0x8CAFu, 0x77A4u, 0x0A4Cu, 0xF147u, 0x77EDu, 0x8CE6u,
> + 0x0920u, 0xF22Bu, 0x7481u, 0x8F8Au, 0xF262u, 0x0969u, 0x8FC3u, 0x74C8u,
> + 0x7413u, 0x8F18u, 0x09B2u, 0xF2B9u, 0x8F51u, 0x745Au, 0xF2F0u, 0x09FBu,
> + 0xF346u, 0x084Du, 0x8EE7u, 0x75ECu, 0x0804u, 0xF30Fu, 0x75A5u, 0x8EAEu,
> + 0x8E75u, 0x757Eu, 0xF3D4u, 0x08DFu, 0x7537u, 0x8E3Cu, 0x0896u, 0xF39Du,
> + 0x88ADu, 0x73A6u, 0xF50Cu, 0x0E07u, 0x73EFu, 0x88E4u, 0x0E4Eu, 0xF545u,
> + 0xF59Eu, 0x0E95u, 0x883Fu, 0x7334u, 0x0EDCu, 0xF5D7u, 0x737Du, 0x8876u,
> + 0x72CBu, 0x89C0u, 0x0F6Au, 0xF461u, 0x8989u, 0x7282u, 0xF428u, 0x0F23u,
> + 0x0FF8u, 0xF4F3u, 0x7259u, 0x8952u, 0xF4BAu, 0x0FB1u, 0x891Bu, 0x7210u,
> + 0xF7D6u, 0x0CDDu, 0x8A77u, 0x717Cu, 0x0C94u, 0xF79Fu, 0x7135u, 0x8A3Eu,
> + 0x8AE5u, 0x71EEu, 0xF744u, 0x0C4Fu, 0x71A7u, 0x8AACu, 0x0C06u, 0xF70Du,
> + 0x0DB0u, 0xF6BBu, 0x7011u, 0x8B1Au, 0xF6F2u, 0x0DF9u, 0x8B53u, 0x7058u,
> + 0x7083u, 0x8B88u, 0x0D22u, 0xF629u, 0x8BC1u, 0x70CAu, 0xF660u, 0x0D6Bu
> + },
> + {
> + 0x0000u, 0xECB6u, 0x52DBu, 0xBE6Du, 0xA5B6u, 0x4900u, 0xF76Du, 0x1BDBu,
> + 0xC0DBu, 0x2C6Du, 0x9200u, 0x7EB6u, 0x656Du, 0x89DBu, 0x37B6u, 0xDB00u,
> + 0x0A01u, 0xE6B7u, 0x58DAu, 0xB46Cu, 0xAFB7u, 0x4301u, 0xFD6Cu, 0x11DAu,
> + 0xCADAu, 0x266Cu, 0x9801u, 0x74B7u, 0x6F6Cu, 0x83DAu, 0x3DB7u, 0xD101u,
> + 0x1402u, 0xF8B4u, 0x46D9u, 0xAA6Fu, 0xB1B4u, 0x5D02u, 0xE36Fu, 0x0FD9u,
> + 0xD4D9u, 0x386Fu, 0x8602u, 0x6AB4u, 0x716Fu, 0x9DD9u, 0x23B4u, 0xCF02u,
> + 0x1E03u, 0xF2B5u, 0x4CD8u, 0xA06Eu, 0xBBB5u, 0x5703u, 0xE96Eu, 0x05D8u,
> + 0xDED8u, 0x326Eu, 0x8C03u, 0x60B5u, 0x7B6Eu, 0x97D8u, 0x29B5u, 0xC503u,
> + 0x2804u, 0xC4B2u, 0x7ADFu, 0x9669u, 0x8DB2u, 0x6104u, 0xDF69u, 0x33DFu,
> + 0xE8DFu, 0x0469u, 0xBA04u, 0x56B2u, 0x4D69u, 0xA1DFu, 0x1FB2u, 0xF304u,
> + 0x2205u, 0xCEB3u, 0x70DEu, 0x9C68u, 0x87B3u, 0x6B05u, 0xD568u, 0x39DEu,
> + 0xE2DEu, 0x0E68u, 0xB005u, 0x5CB3u, 0x4768u, 0xABDEu, 0x15B3u, 0xF905u,
> + 0x3C06u, 0xD0B0u, 0x6EDDu, 0x826Bu, 0x99B0u, 0x7506u, 0xCB6Bu, 0x27DDu,
> + 0xFCDDu, 0x106Bu, 0xAE06u, 0x42B0u, 0x596Bu, 0xB5DDu, 0x0BB0u, 0xE706u,
> + 0x3607u, 0xDAB1u, 0x64DCu, 0x886Au, 0x93B1u, 0x7F07u, 0xC16Au, 0x2DDCu,
> + 0xF6DCu, 0x1A6Au, 0xA407u, 0x48B1u, 0x536Au, 0xBFDCu, 0x01B1u, 0xED07u,
> + 0x5008u, 0xBCBEu, 0x02D3u, 0xEE65u, 0xF5BEu, 0x1908u, 0xA765u, 0x4BD3u,
> + 0x90D3u, 0x7C65u, 0xC208u, 0x2EBEu, 0x3565u, 0xD9D3u, 0x67BEu, 0x8B08u,
> + 0x5A09u, 0xB6BFu, 0x08D2u, 0xE464u, 0xFFBFu, 0x1309u, 0xAD64u, 0x41D2u,
> + 0x9AD2u, 0x7664u, 0xC809u, 0x24BFu, 0x3F64u, 0xD3D2u, 0x6DBFu, 0x8109u,
> + 0x440Au, 0xA8BCu, 0x16D1u, 0xFA67u, 0xE1BCu, 0x0D0Au, 0xB367u, 0x5FD1u,
> + 0x84D1u, 0x6867u, 0xD60Au, 0x3ABCu, 0x2167u, 0xCDD1u, 0x73BCu, 0x9F0Au,
> + 0x4E0Bu, 0xA2BDu, 0x1CD0u, 0xF066u, 0xEBBDu, 0x070Bu, 0xB966u, 0x55D0u,
> + 0x8ED0u, 0x6266u, 0xDC0Bu, 0x30BDu, 0x2B66u, 0xC7D0u, 0x79BDu, 0x950Bu,
> + 0x780Cu, 0x94BAu, 0x2AD7u, 0xC661u, 0xDDBAu, 0x310Cu, 0x8F61u, 0x63D7u,
> + 0xB8D7u, 0x5461u, 0xEA0Cu, 0x06BAu, 0x1D61u, 0xF1D7u, 0x4FBAu, 0xA30Cu,
> + 0x720Du, 0x9EBBu, 0x20D6u, 0xCC60u, 0xD7BBu, 0x3B0Du, 0x8560u, 0x69D6u,
> + 0xB2D6u, 0x5E60u, 0xE00Du, 0x0CBBu, 0x1760u, 0xFBD6u, 0x45BBu, 0xA90Du,
> + 0x6C0Eu, 0x80B8u, 0x3ED5u, 0xD263u, 0xC9B8u, 0x250Eu, 0x9B63u, 0x77D5u,
> + 0xACD5u, 0x4063u, 0xFE0Eu, 0x12B8u, 0x0963u, 0xE5D5u, 0x5BB8u, 0xB70Eu,
> + 0x660Fu, 0x8AB9u, 0x34D4u, 0xD862u, 0xC3B9u, 0x2F0Fu, 0x9162u, 0x7DD4u,
> + 0xA6D4u, 0x4A62u, 0xF40Fu, 0x18B9u, 0x0362u, 0xEFD4u, 0x51B9u, 0xBD0Fu
> + },
> + {
> + 0x0000u, 0xA010u, 0xCB97u, 0x6B87u, 0x1C99u, 0xBC89u, 0xD70Eu, 0x771Eu,
> + 0x3932u, 0x9922u, 0xF2A5u, 0x52B5u, 0x25ABu, 0x85BBu, 0xEE3Cu, 0x4E2Cu,
> + 0x7264u, 0xD274u, 0xB9F3u, 0x19E3u, 0x6EFDu, 0xCEEDu, 0xA56Au, 0x057Au,
> + 0x4B56u, 0xEB46u, 0x80C1u, 0x20D1u, 0x57CFu, 0xF7DFu, 0x9C58u, 0x3C48u,
> + 0xE4C8u, 0x44D8u, 0x2F5Fu, 0x8F4Fu, 0xF851u, 0x5841u, 0x33C6u, 0x93D6u,
> + 0xDDFAu, 0x7DEAu, 0x166Du, 0xB67Du, 0xC163u, 0x6173u, 0x0AF4u, 0xAAE4u,
> + 0x96ACu, 0x36BCu, 0x5D3Bu, 0xFD2Bu, 0x8A35u, 0x2A25u, 0x41A2u, 0xE1B2u,
> + 0xAF9Eu, 0x0F8Eu, 0x6409u, 0xC419u, 0xB307u, 0x1317u, 0x7890u, 0xD880u,
> + 0x4227u, 0xE237u, 0x89B0u, 0x29A0u, 0x5EBEu, 0xFEAEu, 0x9529u, 0x3539u,
> + 0x7B15u, 0xDB05u, 0xB082u, 0x1092u, 0x678Cu, 0xC79Cu, 0xAC1Bu, 0x0C0Bu,
> + 0x3043u, 0x9053u, 0xFBD4u, 0x5BC4u, 0x2CDAu, 0x8CCAu, 0xE74Du, 0x475Du,
> + 0x0971u, 0xA961u, 0xC2E6u, 0x62F6u, 0x15E8u, 0xB5F8u, 0xDE7Fu, 0x7E6Fu,
> + 0xA6EFu, 0x06FFu, 0x6D78u, 0xCD68u, 0xBA76u, 0x1A66u, 0x71E1u, 0xD1F1u,
> + 0x9FDDu, 0x3FCDu, 0x544Au, 0xF45Au, 0x8344u, 0x2354u, 0x48D3u, 0xE8C3u,
> + 0xD48Bu, 0x749Bu, 0x1F1Cu, 0xBF0Cu, 0xC812u, 0x6802u, 0x0385u, 0xA395u,
> + 0xEDB9u, 0x4DA9u, 0x262Eu, 0x863Eu, 0xF120u, 0x5130u, 0x3AB7u, 0x9AA7u,
> + 0x844Eu, 0x245Eu, 0x4FD9u, 0xEFC9u, 0x98D7u, 0x38C7u, 0x5340u, 0xF350u,
> + 0xBD7Cu, 0x1D6Cu, 0x76EBu, 0xD6FBu, 0xA1E5u, 0x01F5u, 0x6A72u, 0xCA62u,
> + 0xF62Au, 0x563Au, 0x3DBDu, 0x9DADu, 0xEAB3u, 0x4AA3u, 0x2124u, 0x8134u,
> + 0xCF18u, 0x6F08u, 0x048Fu, 0xA49Fu, 0xD381u, 0x7391u, 0x1816u, 0xB806u,
> + 0x6086u, 0xC096u, 0xAB11u, 0x0B01u, 0x7C1Fu, 0xDC0Fu, 0xB788u, 0x1798u,
> + 0x59B4u, 0xF9A4u, 0x9223u, 0x3233u, 0x452Du, 0xE53Du, 0x8EBAu, 0x2EAAu,
> + 0x12E2u, 0xB2F2u, 0xD975u, 0x7965u, 0x0E7Bu, 0xAE6Bu, 0xC5ECu, 0x65FCu,
> + 0x2BD0u, 0x8BC0u, 0xE047u, 0x4057u, 0x3749u, 0x9759u, 0xFCDEu, 0x5CCEu,
> + 0xC669u, 0x6679u, 0x0DFEu, 0xADEEu, 0xDAF0u, 0x7AE0u, 0x1167u, 0xB177u,
> + 0xFF5Bu, 0x5F4Bu, 0x34CCu, 0x94DCu, 0xE3C2u, 0x43D2u, 0x2855u, 0x8845u,
> + 0xB40Du, 0x141Du, 0x7F9Au, 0xDF8Au, 0xA894u, 0x0884u, 0x6303u, 0xC313u,
> + 0x8D3Fu, 0x2D2Fu, 0x46A8u, 0xE6B8u, 0x91A6u, 0x31B6u, 0x5A31u, 0xFA21u,
> + 0x22A1u, 0x82B1u, 0xE936u, 0x4926u, 0x3E38u, 0x9E28u, 0xF5AFu, 0x55BFu,
> + 0x1B93u, 0xBB83u, 0xD004u, 0x7014u, 0x070Au, 0xA71Au, 0xCC9Du, 0x6C8Du,
> + 0x50C5u, 0xF0D5u, 0x9B52u, 0x3B42u, 0x4C5Cu, 0xEC4Cu, 0x87CBu, 0x27DBu,
> + 0x69F7u, 0xC9E7u, 0xA260u, 0x0270u, 0x756Eu, 0xD57Eu, 0xBEF9u, 0x1EE9u
> + },
> + {
> + 0x0000u, 0x832Bu, 0x8DE1u, 0x0ECAu, 0x9075u, 0x135Eu, 0x1D94u, 0x9EBFu,
> + 0xAB5Du, 0x2876u, 0x26BCu, 0xA597u, 0x3B28u, 0xB803u, 0xB6C9u, 0x35E2u,
> + 0xDD0Du, 0x5E26u, 0x50ECu, 0xD3C7u, 0x4D78u, 0xCE53u, 0xC099u, 0x43B2u,
> + 0x7650u, 0xF57Bu, 0xFBB1u, 0x789Au, 0xE625u, 0x650Eu, 0x6BC4u, 0xE8EFu,
> + 0x31ADu, 0xB286u, 0xBC4Cu, 0x3F67u, 0xA1D8u, 0x22F3u, 0x2C39u, 0xAF12u,
> + 0x9AF0u, 0x19DBu, 0x1711u, 0x943Au, 0x0A85u, 0x89AEu, 0x8764u, 0x044Fu,
> + 0xECA0u, 0x6F8Bu, 0x6141u, 0xE26Au, 0x7CD5u, 0xFFFEu, 0xF134u, 0x721Fu,
> + 0x47FDu, 0xC4D6u, 0xCA1Cu, 0x4937u, 0xD788u, 0x54A3u, 0x5A69u, 0xD942u,
> + 0x635Au, 0xE071u, 0xEEBBu, 0x6D90u, 0xF32Fu, 0x7004u, 0x7ECEu, 0xFDE5u,
> + 0xC807u, 0x4B2Cu, 0x45E6u, 0xC6CDu, 0x5872u, 0xDB59u, 0xD593u, 0x56B8u,
> + 0xBE57u, 0x3D7Cu, 0x33B6u, 0xB09Du, 0x2E22u, 0xAD09u, 0xA3C3u, 0x20E8u,
> + 0x150Au, 0x9621u, 0x98EBu, 0x1BC0u, 0x857Fu, 0x0654u, 0x089Eu, 0x8BB5u,
> + 0x52F7u, 0xD1DCu, 0xDF16u, 0x5C3Du, 0xC282u, 0x41A9u, 0x4F63u, 0xCC48u,
> + 0xF9AAu, 0x7A81u, 0x744Bu, 0xF760u, 0x69DFu, 0xEAF4u, 0xE43Eu, 0x6715u,
> + 0x8FFAu, 0x0CD1u, 0x021Bu, 0x8130u, 0x1F8Fu, 0x9CA4u, 0x926Eu, 0x1145u,
> + 0x24A7u, 0xA78Cu, 0xA946u, 0x2A6Du, 0xB4D2u, 0x37F9u, 0x3933u, 0xBA18u,
> + 0xC6B4u, 0x459Fu, 0x4B55u, 0xC87Eu, 0x56C1u, 0xD5EAu, 0xDB20u, 0x580Bu,
> + 0x6DE9u, 0xEEC2u, 0xE008u, 0x6323u, 0xFD9Cu, 0x7EB7u, 0x707Du, 0xF356u,
> + 0x1BB9u, 0x9892u, 0x9658u, 0x1573u, 0x8BCCu, 0x08E7u, 0x062Du, 0x8506u,
> + 0xB0E4u, 0x33CFu, 0x3D05u, 0xBE2Eu, 0x2091u, 0xA3BAu, 0xAD70u, 0x2E5Bu,
> + 0xF719u, 0x7432u, 0x7AF8u, 0xF9D3u, 0x676Cu, 0xE447u, 0xEA8Du, 0x69A6u,
> + 0x5C44u, 0xDF6Fu, 0xD1A5u, 0x528Eu, 0xCC31u, 0x4F1Au, 0x41D0u, 0xC2FBu,
> + 0x2A14u, 0xA93Fu, 0xA7F5u, 0x24DEu, 0xBA61u, 0x394Au, 0x3780u, 0xB4ABu,
> + 0x8149u, 0x0262u, 0x0CA8u, 0x8F83u, 0x113Cu, 0x9217u, 0x9CDDu, 0x1FF6u,
> + 0xA5EEu, 0x26C5u, 0x280Fu, 0xAB24u, 0x359Bu, 0xB6B0u, 0xB87Au, 0x3B51u,
> + 0x0EB3u, 0x8D98u, 0x8352u, 0x0079u, 0x9EC6u, 0x1DEDu, 0x1327u, 0x900Cu,
> + 0x78E3u, 0xFBC8u, 0xF502u, 0x7629u, 0xE896u, 0x6BBDu, 0x6577u, 0xE65Cu,
> + 0xD3BEu, 0x5095u, 0x5E5Fu, 0xDD74u, 0x43CBu, 0xC0E0u, 0xCE2Au, 0x4D01u,
> + 0x9443u, 0x1768u, 0x19A2u, 0x9A89u, 0x0436u, 0x871Du, 0x89D7u, 0x0AFCu,
> + 0x3F1Eu, 0xBC35u, 0xB2FFu, 0x31D4u, 0xAF6Bu, 0x2C40u, 0x228Au, 0xA1A1u,
> + 0x494Eu, 0xCA65u, 0xC4AFu, 0x4784u, 0xD93Bu, 0x5A10u, 0x54DAu, 0xD7F1u,
> + 0xE213u, 0x6138u, 0x6FF2u, 0xECD9u, 0x7266u, 0xF14Du, 0xFF87u, 0x7CACu
> + }
> };
>
> __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len)
> {
> - unsigned int i;
> + const __u8 *i = (const __u8 *)buffer;
> + const __u8 *i_end = i + len;
> + const __u8 *i_last16 = i + (len / 16 * 16);

Why is i_last16 a u8? The len parameter of buffer can be much bigger
than u8. Seems like the crc computation will miss the rest of the buffer
if the length of buffer is much greater than 256.

Tim

>
> - for (i = 0 ; i < len ; i++)
> - crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
> + for (; i < i_last16; i += 16) {
> + crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^
> + t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^
> + t10_dif_crc_table[13][i[2]] ^
> + t10_dif_crc_table[12][i[3]] ^
> + t10_dif_crc_table[11][i[4]] ^
> + t10_dif_crc_table[10][i[5]] ^
> + t10_dif_crc_table[9][i[6]] ^
> + t10_dif_crc_table[8][i[7]] ^
> + t10_dif_crc_table[7][i[8]] ^
> + t10_dif_crc_table[6][i[9]] ^
> + t10_dif_crc_table[5][i[10]] ^
> + t10_dif_crc_table[4][i[11]] ^
> + t10_dif_crc_table[3][i[12]] ^
> + t10_dif_crc_table[2][i[13]] ^
> + t10_dif_crc_table[1][i[14]] ^
> + t10_dif_crc_table[0][i[15]];
> + }
> +
> + for (; i < i_end; i++)
> + crc = t10_dif_crc_table[0][*i ^ (__u8)(crc >> 8)] ^ (crc << 8);
>
> return crc;
> }
>

2018-08-15 12:51:45

by Jeffrey Lien

[permalink] [raw]
Subject: RE: [PATCH] Performance Improvement in CRC16 Calculations.

Tim,
To answer your question "Why is i_last16 a u8?". It's not a u8; it's a u8 * so that won't limit the length to 256.


Jeff Lien

-----Original Message-----
From: Tim Chen [mailto:[email protected]]
Sent: Monday, August 13, 2018 5:45 PM
To: Jeffrey Lien <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]
Cc: [email protected]; [email protected]; David Darrington <[email protected]>; Jeff Furlong <[email protected]>
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On 08/10/2018 12:12 PM, Jeff Lien wrote:
> This patch provides a performance improvement for the CRC16
> calculations done in read/write workloads using the T10 Type 1/2/3
> guard field. For example, today with sequential write workloads (one
> thread/CPU of IO) we consume 100% of the CPU because of the CRC16
> computation bottleneck. Today's block devices are considerably
> faster, but the CRC16 calculation prevents folks from utilizing the
> throughput of such devices. To speed up this calculation and expose
> the block device throughput, we slice the old single byte for loop into a 16 byte for loop, with a larger CRC table to match. The result has shown 5x performance improvements on various big endian and little endian systems running the 4.18.0 kernel version.
>
> FIO Sequential Write, 64K Block Size, Queue Depth 64
> BE Base Kernel: bw=201.5 MiB/s
> BE Modified CRC Calc: bw=968.1 MiB/s
> 4.80x performance improvement
>
> LE Base Kernel: bw=357 MiB/s
> LE Modified CRC Calc: bw=1964 MiB/s
> 5.51x performance improvement
>
> FIO Sequential Read, 64K Block Size, Queue Depth 64
> BE Base Kernel: bw=611.2 MiB/s
> BE Modified CRC calc: bw=684.9 MiB/s
> 1.12x performance improvement
>
> LE Base Kernel: bw=797 MiB/s
> LE Modified CRC Calc: bw=2730 MiB/s
> 3.42x performance improvement
>
> Reviewed-by: Dave Darrington <[email protected]>
> Reviewed-by: Jeff Furlong <[email protected]>
> Signed-off-by: Jeff Lien <[email protected]>
> ---
> crypto/crct10dif_common.c | 605
> +++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 569 insertions(+), 36 deletions(-)
>
> diff --git a/crypto/crct10dif_common.c b/crypto/crct10dif_common.c
> index b2fab36..40e1d6c 100644
> --- a/crypto/crct10dif_common.c
> +++ b/crypto/crct10dif_common.c
> @@ -32,47 +32,580 @@
> * x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
> * gt: 0x8bb7
> */
> -static const __u16 t10_dif_crc_table[256] = {
> - 0x0000, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
> - 0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
> - 0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
> - 0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
> - 0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
> - 0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
> - 0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
> - 0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
> - 0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
> - 0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
> - 0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
> - 0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
> - 0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
> - 0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
> - 0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
> - 0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
> - 0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
> - 0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
> - 0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
> - 0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
> - 0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
> - 0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
> - 0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
> - 0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
> - 0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
> - 0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
> - 0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
> - 0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
> - 0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
> - 0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
> - 0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
> - 0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
> +static const __u16 t10_dif_crc_table[16][256] = {
> + {
> + 0x0000u, 0x8BB7u, 0x9CD9u, 0x176Eu, 0xB205u, 0x39B2u, 0x2EDCu, 0xA56Bu,
> + 0xEFBDu, 0x640Au, 0x7364u, 0xF8D3u, 0x5DB8u, 0xD60Fu, 0xC161u, 0x4AD6u,
> + 0x54CDu, 0xDF7Au, 0xC814u, 0x43A3u, 0xE6C8u, 0x6D7Fu, 0x7A11u, 0xF1A6u,
> + 0xBB70u, 0x30C7u, 0x27A9u, 0xAC1Eu, 0x0975u, 0x82C2u, 0x95ACu, 0x1E1Bu,
> + 0xA99Au, 0x222Du, 0x3543u, 0xBEF4u, 0x1B9Fu, 0x9028u, 0x8746u, 0x0CF1u,
> + 0x4627u, 0xCD90u, 0xDAFEu, 0x5149u, 0xF422u, 0x7F95u, 0x68FBu, 0xE34Cu,
> + 0xFD57u, 0x76E0u, 0x618Eu, 0xEA39u, 0x4F52u, 0xC4E5u, 0xD38Bu, 0x583Cu,
> + 0x12EAu, 0x995Du, 0x8E33u, 0x0584u, 0xA0EFu, 0x2B58u, 0x3C36u, 0xB781u,
> + 0xD883u, 0x5334u, 0x445Au, 0xCFEDu, 0x6A86u, 0xE131u, 0xF65Fu, 0x7DE8u,
> + 0x373Eu, 0xBC89u, 0xABE7u, 0x2050u, 0x853Bu, 0x0E8Cu, 0x19E2u, 0x9255u,
> + 0x8C4Eu, 0x07F9u, 0x1097u, 0x9B20u, 0x3E4Bu, 0xB5FCu, 0xA292u, 0x2925u,
> + 0x63F3u, 0xE844u, 0xFF2Au, 0x749Du, 0xD1F6u, 0x5A41u, 0x4D2Fu, 0xC698u,
> + 0x7119u, 0xFAAEu, 0xEDC0u, 0x6677u, 0xC31Cu, 0x48ABu, 0x5FC5u, 0xD472u,
> + 0x9EA4u, 0x1513u, 0x027Du, 0x89CAu, 0x2CA1u, 0xA716u, 0xB078u, 0x3BCFu,
> + 0x25D4u, 0xAE63u, 0xB90Du, 0x32BAu, 0x97D1u, 0x1C66u, 0x0B08u, 0x80BFu,
> + 0xCA69u, 0x41DEu, 0x56B0u, 0xDD07u, 0x786Cu, 0xF3DBu, 0xE4B5u, 0x6F02u,
> + 0x3AB1u, 0xB106u, 0xA668u, 0x2DDFu, 0x88B4u, 0x0303u, 0x146Du, 0x9FDAu,
> + 0xD50Cu, 0x5EBBu, 0x49D5u, 0xC262u, 0x6709u, 0xECBEu, 0xFBD0u, 0x7067u,
> + 0x6E7Cu, 0xE5CBu, 0xF2A5u, 0x7912u, 0xDC79u, 0x57CEu, 0x40A0u, 0xCB17u,
> + 0x81C1u, 0x0A76u, 0x1D18u, 0x96AFu, 0x33C4u, 0xB873u, 0xAF1Du, 0x24AAu,
> + 0x932Bu, 0x189Cu, 0x0FF2u, 0x8445u, 0x212Eu, 0xAA99u, 0xBDF7u, 0x3640u,
> + 0x7C96u, 0xF721u, 0xE04Fu, 0x6BF8u, 0xCE93u, 0x4524u, 0x524Au, 0xD9FDu,
> + 0xC7E6u, 0x4C51u, 0x5B3Fu, 0xD088u, 0x75E3u, 0xFE54u, 0xE93Au, 0x628Du,
> + 0x285Bu, 0xA3ECu, 0xB482u, 0x3F35u, 0x9A5Eu, 0x11E9u, 0x0687u, 0x8D30u,
> + 0xE232u, 0x6985u, 0x7EEBu, 0xF55Cu, 0x5037u, 0xDB80u, 0xCCEEu, 0x4759u,
> + 0x0D8Fu, 0x8638u, 0x9156u, 0x1AE1u, 0xBF8Au, 0x343Du, 0x2353u, 0xA8E4u,
> + 0xB6FFu, 0x3D48u, 0x2A26u, 0xA191u, 0x04FAu, 0x8F4Du, 0x9823u, 0x1394u,
> + 0x5942u, 0xD2F5u, 0xC59Bu, 0x4E2Cu, 0xEB47u, 0x60F0u, 0x779Eu, 0xFC29u,
> + 0x4BA8u, 0xC01Fu, 0xD771u, 0x5CC6u, 0xF9ADu, 0x721Au, 0x6574u, 0xEEC3u,
> + 0xA415u, 0x2FA2u, 0x38CCu, 0xB37Bu, 0x1610u, 0x9DA7u, 0x8AC9u, 0x017Eu,
> + 0x1F65u, 0x94D2u, 0x83BCu, 0x080Bu, 0xAD60u, 0x26D7u, 0x31B9u, 0xBA0Eu,
> + 0xF0D8u, 0x7B6Fu, 0x6C01u, 0xE7B6u, 0x42DDu, 0xC96Au, 0xDE04u, 0x55B3u
> + },
> + {
> + 0x0000u, 0x7562u, 0xEAC4u, 0x9FA6u, 0x5E3Fu, 0x2B5Du, 0xB4FBu, 0xC199u,
> + 0xBC7Eu, 0xC91Cu, 0x56BAu, 0x23D8u, 0xE241u, 0x9723u, 0x0885u, 0x7DE7u,
> + 0xF34Bu, 0x8629u, 0x198Fu, 0x6CEDu, 0xAD74u, 0xD816u, 0x47B0u, 0x32D2u,
> + 0x4F35u, 0x3A57u, 0xA5F1u, 0xD093u, 0x110Au, 0x6468u, 0xFBCEu, 0x8EACu,
> + 0x6D21u, 0x1843u, 0x87E5u, 0xF287u, 0x331Eu, 0x467Cu, 0xD9DAu, 0xACB8u,
> + 0xD15Fu, 0xA43Du, 0x3B9Bu, 0x4EF9u, 0x8F60u, 0xFA02u, 0x65A4u, 0x10C6u,
> + 0x9E6Au, 0xEB08u, 0x74AEu, 0x01CCu, 0xC055u, 0xB537u, 0x2A91u, 0x5FF3u,
> + 0x2214u, 0x5776u, 0xC8D0u, 0xBDB2u, 0x7C2Bu, 0x0949u, 0x96EFu, 0xE38Du,
> + 0xDA42u, 0xAF20u, 0x3086u, 0x45E4u, 0x847Du, 0xF11Fu, 0x6EB9u, 0x1BDBu,
> + 0x663Cu, 0x135Eu, 0x8CF8u, 0xF99Au, 0x3803u, 0x4D61u, 0xD2C7u, 0xA7A5u,
> + 0x2909u, 0x5C6Bu, 0xC3CDu, 0xB6AFu, 0x7736u, 0x0254u, 0x9DF2u, 0xE890u,
> + 0x9577u, 0xE015u, 0x7FB3u, 0x0AD1u, 0xCB48u, 0xBE2Au, 0x218Cu, 0x54EEu,
> + 0xB763u, 0xC201u, 0x5DA7u, 0x28C5u, 0xE95Cu, 0x9C3Eu, 0x0398u, 0x76FAu,
> + 0x0B1Du, 0x7E7Fu, 0xE1D9u, 0x94BBu, 0x5522u, 0x2040u, 0xBFE6u, 0xCA84u,
> + 0x4428u, 0x314Au, 0xAEECu, 0xDB8Eu, 0x1A17u, 0x6F75u, 0xF0D3u, 0x85B1u,
> + 0xF856u, 0x8D34u, 0x1292u, 0x67F0u, 0xA669u, 0xD30Bu, 0x4CADu, 0x39CFu,
> + 0x3F33u, 0x4A51u, 0xD5F7u, 0xA095u, 0x610Cu, 0x146Eu, 0x8BC8u, 0xFEAAu,
> + 0x834Du, 0xF62Fu, 0x6989u, 0x1CEBu, 0xDD72u, 0xA810u, 0x37B6u, 0x42D4u,
> + 0xCC78u, 0xB91Au, 0x26BCu, 0x53DEu, 0x9247u, 0xE725u, 0x7883u, 0x0DE1u,
> + 0x7006u, 0x0564u, 0x9AC2u, 0xEFA0u, 0x2E39u, 0x5B5Bu, 0xC4FDu, 0xB19Fu,
> + 0x5212u, 0x2770u, 0xB8D6u, 0xCDB4u, 0x0C2Du, 0x794Fu, 0xE6E9u, 0x938Bu,
> + 0xEE6Cu, 0x9B0Eu, 0x04A8u, 0x71CAu, 0xB053u, 0xC531u, 0x5A97u, 0x2FF5u,
> + 0xA159u, 0xD43Bu, 0x4B9Du, 0x3EFFu, 0xFF66u, 0x8A04u, 0x15A2u, 0x60C0u,
> + 0x1D27u, 0x6845u, 0xF7E3u, 0x8281u, 0x4318u, 0x367Au, 0xA9DCu, 0xDCBEu,
> + 0xE571u, 0x9013u, 0x0FB5u, 0x7AD7u, 0xBB4Eu, 0xCE2Cu, 0x518Au, 0x24E8u,
> + 0x590Fu, 0x2C6Du, 0xB3CBu, 0xC6A9u, 0x0730u, 0x7252u, 0xEDF4u, 0x9896u,
> + 0x163Au, 0x6358u, 0xFCFEu, 0x899Cu, 0x4805u, 0x3D67u, 0xA2C1u, 0xD7A3u,
> + 0xAA44u, 0xDF26u, 0x4080u, 0x35E2u, 0xF47Bu, 0x8119u, 0x1EBFu, 0x6BDDu,
> + 0x8850u, 0xFD32u, 0x6294u, 0x17F6u, 0xD66Fu, 0xA30Du, 0x3CABu, 0x49C9u,
> + 0x342Eu, 0x414Cu, 0xDEEAu, 0xAB88u, 0x6A11u, 0x1F73u, 0x80D5u, 0xF5B7u,
> + 0x7B1Bu, 0x0E79u, 0x91DFu, 0xE4BDu, 0x2524u, 0x5046u, 0xCFE0u, 0xBA82u,
> + 0xC765u, 0xB207u, 0x2DA1u, 0x58C3u, 0x995Au, 0xEC38u, 0x739Eu, 0x06FCu
> + },
> + {
> + 0x0000u, 0x7E66u, 0xFCCCu, 0x82AAu, 0x722Fu, 0x0C49u, 0x8EE3u, 0xF085u,
> + 0xE45Eu, 0x9A38u, 0x1892u, 0x66F4u, 0x9671u, 0xE817u, 0x6ABDu, 0x14DBu,
> + 0x430Bu, 0x3D6Du, 0xBFC7u, 0xC1A1u, 0x3124u, 0x4F42u, 0xCDE8u, 0xB38Eu,
> + 0xA755u, 0xD933u, 0x5B99u, 0x25FFu, 0xD57Au, 0xAB1Cu, 0x29B6u, 0x57D0u,
> + 0x8616u, 0xF870u, 0x7ADAu, 0x04BCu, 0xF439u, 0x8A5Fu, 0x08F5u, 0x7693u,
> + 0x6248u, 0x1C2Eu, 0x9E84u, 0xE0E2u, 0x1067u, 0x6E01u, 0xECABu, 0x92CDu,
> + 0xC51Du, 0xBB7Bu, 0x39D1u, 0x47B7u, 0xB732u, 0xC954u, 0x4BFEu, 0x3598u,
> + 0x2143u, 0x5F25u, 0xDD8Fu, 0xA3E9u, 0x536Cu, 0x2D0Au, 0xAFA0u, 0xD1C6u,
> + 0x879Bu, 0xF9FDu, 0x7B57u, 0x0531u, 0xF5B4u, 0x8BD2u, 0x0978u, 0x771Eu,
> + 0x63C5u, 0x1DA3u, 0x9F09u, 0xE16Fu, 0x11EAu, 0x6F8Cu, 0xED26u, 0x9340u,
> + 0xC490u, 0xBAF6u, 0x385Cu, 0x463Au, 0xB6BFu, 0xC8D9u, 0x4A73u, 0x3415u,
> + 0x20CEu, 0x5EA8u, 0xDC02u, 0xA264u, 0x52E1u, 0x2C87u, 0xAE2Du, 0xD04Bu,
> + 0x018Du, 0x7FEBu, 0xFD41u, 0x8327u, 0x73A2u, 0x0DC4u, 0x8F6Eu, 0xF108u,
> + 0xE5D3u, 0x9BB5u, 0x191Fu, 0x6779u, 0x97FCu, 0xE99Au, 0x6B30u, 0x1556u,
> + 0x4286u, 0x3CE0u, 0xBE4Au, 0xC02Cu, 0x30A9u, 0x4ECFu, 0xCC65u, 0xB203u,
> + 0xA6D8u, 0xD8BEu, 0x5A14u, 0x2472u, 0xD4F7u, 0xAA91u, 0x283Bu, 0x565Du,
> + 0x8481u, 0xFAE7u, 0x784Du, 0x062Bu, 0xF6AEu, 0x88C8u, 0x0A62u, 0x7404u,
> + 0x60DFu, 0x1EB9u, 0x9C13u, 0xE275u, 0x12F0u, 0x6C96u, 0xEE3Cu, 0x905Au,
> + 0xC78Au, 0xB9ECu, 0x3B46u, 0x4520u, 0xB5A5u, 0xCBC3u, 0x4969u, 0x370Fu,
> + 0x23D4u, 0x5DB2u, 0xDF18u, 0xA17Eu, 0x51FBu, 0x2F9Du, 0xAD37u, 0xD351u,
> + 0x0297u, 0x7CF1u, 0xFE5Bu, 0x803Du, 0x70B8u, 0x0EDEu, 0x8C74u, 0xF212u,
> + 0xE6C9u, 0x98AFu, 0x1A05u, 0x6463u, 0x94E6u, 0xEA80u, 0x682Au, 0x164Cu,
> + 0x419Cu, 0x3FFAu, 0xBD50u, 0xC336u, 0x33B3u, 0x4DD5u, 0xCF7Fu, 0xB119u,
> + 0xA5C2u, 0xDBA4u, 0x590Eu, 0x2768u, 0xD7EDu, 0xA98Bu, 0x2B21u, 0x5547u,
> + 0x031Au, 0x7D7Cu, 0xFFD6u, 0x81B0u, 0x7135u, 0x0F53u, 0x8DF9u, 0xF39Fu,
> + 0xE744u, 0x9922u, 0x1B88u, 0x65EEu, 0x956Bu, 0xEB0Du, 0x69A7u, 0x17C1u,
> + 0x4011u, 0x3E77u, 0xBCDDu, 0xC2BBu, 0x323Eu, 0x4C58u, 0xCEF2u, 0xB094u,
> + 0xA44Fu, 0xDA29u, 0x5883u, 0x26E5u, 0xD660u, 0xA806u, 0x2AACu, 0x54CAu,
> + 0x850Cu, 0xFB6Au, 0x79C0u, 0x07A6u, 0xF723u, 0x8945u, 0x0BEFu, 0x7589u,
> + 0x6152u, 0x1F34u, 0x9D9Eu, 0xE3F8u, 0x137Du, 0x6D1Bu, 0xEFB1u, 0x91D7u,
> + 0xC607u, 0xB861u, 0x3ACBu, 0x44ADu, 0xB428u, 0xCA4Eu, 0x48E4u, 0x3682u,
> + 0x2259u, 0x5C3Fu, 0xDE95u, 0xA0F3u, 0x5076u, 0x2E10u, 0xACBAu, 0xD2DCu
> + },
> + {
> + 0x0000u, 0x82B5u, 0x8EDDu, 0x0C68u, 0x960Du, 0x14B8u, 0x18D0u, 0x9A65u,
> + 0xA7ADu, 0x2518u, 0x2970u, 0xABC5u, 0x31A0u, 0xB315u, 0xBF7Du, 0x3DC8u,
> + 0xC4EDu, 0x4658u, 0x4A30u, 0xC885u, 0x52E0u, 0xD055u, 0xDC3Du, 0x5E88u,
> + 0x6340u, 0xE1F5u, 0xED9Du, 0x6F28u, 0xF54Du, 0x77F8u, 0x7B90u, 0xF925u,
> + 0x026Du, 0x80D8u, 0x8CB0u, 0x0E05u, 0x9460u, 0x16D5u, 0x1ABDu, 0x9808u,
> + 0xA5C0u, 0x2775u, 0x2B1Du, 0xA9A8u, 0x33CDu, 0xB178u, 0xBD10u, 0x3FA5u,
> + 0xC680u, 0x4435u, 0x485Du, 0xCAE8u, 0x508Du, 0xD238u, 0xDE50u, 0x5CE5u,
> + 0x612Du, 0xE398u, 0xEFF0u, 0x6D45u, 0xF720u, 0x7595u, 0x79FDu, 0xFB48u,
> + 0x04DAu, 0x866Fu, 0x8A07u, 0x08B2u, 0x92D7u, 0x1062u, 0x1C0Au, 0x9EBFu,
> + 0xA377u, 0x21C2u, 0x2DAAu, 0xAF1Fu, 0x357Au, 0xB7CFu, 0xBBA7u, 0x3912u,
> + 0xC037u, 0x4282u, 0x4EEAu, 0xCC5Fu, 0x563Au, 0xD48Fu, 0xD8E7u, 0x5A52u,
> + 0x679Au, 0xE52Fu, 0xE947u, 0x6BF2u, 0xF197u, 0x7322u, 0x7F4Au, 0xFDFFu,
> + 0x06B7u, 0x8402u, 0x886Au, 0x0ADFu, 0x90BAu, 0x120Fu, 0x1E67u, 0x9CD2u,
> + 0xA11Au, 0x23AFu, 0x2FC7u, 0xAD72u, 0x3717u, 0xB5A2u, 0xB9CAu, 0x3B7Fu,
> + 0xC25Au, 0x40EFu, 0x4C87u, 0xCE32u, 0x5457u, 0xD6E2u, 0xDA8Au, 0x583Fu,
> + 0x65F7u, 0xE742u, 0xEB2Au, 0x699Fu, 0xF3FAu, 0x714Fu, 0x7D27u, 0xFF92u,
> + 0x09B4u, 0x8B01u, 0x8769u, 0x05DCu, 0x9FB9u, 0x1D0Cu, 0x1164u, 0x93D1u,
> + 0xAE19u, 0x2CACu, 0x20C4u, 0xA271u, 0x3814u, 0xBAA1u, 0xB6C9u, 0x347Cu,
> + 0xCD59u, 0x4FECu, 0x4384u, 0xC131u, 0x5B54u, 0xD9E1u, 0xD589u, 0x573Cu,
> + 0x6AF4u, 0xE841u, 0xE429u, 0x669Cu, 0xFCF9u, 0x7E4Cu, 0x7224u, 0xF091u,
> + 0x0BD9u, 0x896Cu, 0x8504u, 0x07B1u, 0x9DD4u, 0x1F61u, 0x1309u, 0x91BCu,
> + 0xAC74u, 0x2EC1u, 0x22A9u, 0xA01Cu, 0x3A79u, 0xB8CCu, 0xB4A4u, 0x3611u,
> + 0xCF34u, 0x4D81u, 0x41E9u, 0xC35Cu, 0x5939u, 0xDB8Cu, 0xD7E4u, 0x5551u,
> + 0x6899u, 0xEA2Cu, 0xE644u, 0x64F1u, 0xFE94u, 0x7C21u, 0x7049u, 0xF2FCu,
> + 0x0D6Eu, 0x8FDBu, 0x83B3u, 0x0106u, 0x9B63u, 0x19D6u, 0x15BEu, 0x970Bu,
> + 0xAAC3u, 0x2876u, 0x241Eu, 0xA6ABu, 0x3CCEu, 0xBE7Bu, 0xB213u, 0x30A6u,
> + 0xC983u, 0x4B36u, 0x475Eu, 0xC5EBu, 0x5F8Eu, 0xDD3Bu, 0xD153u, 0x53E6u,
> + 0x6E2Eu, 0xEC9Bu, 0xE0F3u, 0x6246u, 0xF823u, 0x7A96u, 0x76FEu, 0xF44Bu,
> + 0x0F03u, 0x8DB6u, 0x81DEu, 0x036Bu, 0x990Eu, 0x1BBBu, 0x17D3u, 0x9566u,
> + 0xA8AEu, 0x2A1Bu, 0x2673u, 0xA4C6u, 0x3EA3u, 0xBC16u, 0xB07Eu, 0x32CBu,
> + 0xCBEEu, 0x495Bu, 0x4533u, 0xC786u, 0x5DE3u, 0xDF56u, 0xD33Eu, 0x518Bu,
> + 0x6C43u, 0xEEF6u, 0xE29Eu, 0x602Bu, 0xFA4Eu, 0x78FBu, 0x7493u, 0xF626u
> + },
> + {
> + 0x0000u, 0x1368u, 0x26D0u, 0x35B8u, 0x4DA0u, 0x5EC8u, 0x6B70u, 0x7818u,
> + 0x9B40u, 0x8828u, 0xBD90u, 0xAEF8u, 0xD6E0u, 0xC588u, 0xF030u, 0xE358u,
> + 0xBD37u, 0xAE5Fu, 0x9BE7u, 0x888Fu, 0xF097u, 0xE3FFu, 0xD647u, 0xC52Fu,
> + 0x2677u, 0x351Fu, 0x00A7u, 0x13CFu, 0x6BD7u, 0x78BFu, 0x4D07u, 0x5E6Fu,
> + 0xF1D9u, 0xE2B1u, 0xD709u, 0xC461u, 0xBC79u, 0xAF11u, 0x9AA9u, 0x89C1u,
> + 0x6A99u, 0x79F1u, 0x4C49u, 0x5F21u, 0x2739u, 0x3451u, 0x01E9u, 0x1281u,
> + 0x4CEEu, 0x5F86u, 0x6A3Eu, 0x7956u, 0x014Eu, 0x1226u, 0x279Eu, 0x34F6u,
> + 0xD7AEu, 0xC4C6u, 0xF17Eu, 0xE216u, 0x9A0Eu, 0x8966u, 0xBCDEu, 0xAFB6u,
> + 0x6805u, 0x7B6Du, 0x4ED5u, 0x5DBDu, 0x25A5u, 0x36CDu, 0x0375u, 0x101Du,
> + 0xF345u, 0xE02Du, 0xD595u, 0xC6FDu, 0xBEE5u, 0xAD8Du, 0x9835u, 0x8B5Du,
> + 0xD532u, 0xC65Au, 0xF3E2u, 0xE08Au, 0x9892u, 0x8BFAu, 0xBE42u, 0xAD2Au,
> + 0x4E72u, 0x5D1Au, 0x68A2u, 0x7BCAu, 0x03D2u, 0x10BAu, 0x2502u, 0x366Au,
> + 0x99DCu, 0x8AB4u, 0xBF0Cu, 0xAC64u, 0xD47Cu, 0xC714u, 0xF2ACu, 0xE1C4u,
> + 0x029Cu, 0x11F4u, 0x244Cu, 0x3724u, 0x4F3Cu, 0x5C54u, 0x69ECu, 0x7A84u,
> + 0x24EBu, 0x3783u, 0x023Bu, 0x1153u, 0x694Bu, 0x7A23u, 0x4F9Bu, 0x5CF3u,
> + 0xBFABu, 0xACC3u, 0x997Bu, 0x8A13u, 0xF20Bu, 0xE163u, 0xD4DBu, 0xC7B3u,
> + 0xD00Au, 0xC362u, 0xF6DAu, 0xE5B2u, 0x9DAAu, 0x8EC2u, 0xBB7Au, 0xA812u,
> + 0x4B4Au, 0x5822u, 0x6D9Au, 0x7EF2u, 0x06EAu, 0x1582u, 0x203Au, 0x3352u,
> + 0x6D3Du, 0x7E55u, 0x4BEDu, 0x5885u, 0x209Du, 0x33F5u, 0x064Du, 0x1525u,
> + 0xF67Du, 0xE515u, 0xD0ADu, 0xC3C5u, 0xBBDDu, 0xA8B5u, 0x9D0Du, 0x8E65u,
> + 0x21D3u, 0x32BBu, 0x0703u, 0x146Bu, 0x6C73u, 0x7F1Bu, 0x4AA3u, 0x59CBu,
> + 0xBA93u, 0xA9FBu, 0x9C43u, 0x8F2Bu, 0xF733u, 0xE45Bu, 0xD1E3u, 0xC28Bu,
> + 0x9CE4u, 0x8F8Cu, 0xBA34u, 0xA95Cu, 0xD144u, 0xC22Cu, 0xF794u, 0xE4FCu,
> + 0x07A4u, 0x14CCu, 0x2174u, 0x321Cu, 0x4A04u, 0x596Cu, 0x6CD4u, 0x7FBCu,
> + 0xB80Fu, 0xAB67u, 0x9EDFu, 0x8DB7u, 0xF5AFu, 0xE6C7u, 0xD37Fu, 0xC017u,
> + 0x234Fu, 0x3027u, 0x059Fu, 0x16F7u, 0x6EEFu, 0x7D87u, 0x483Fu, 0x5B57u,
> + 0x0538u, 0x1650u, 0x23E8u, 0x3080u, 0x4898u, 0x5BF0u, 0x6E48u, 0x7D20u,
> + 0x9E78u, 0x8D10u, 0xB8A8u, 0xABC0u, 0xD3D8u, 0xC0B0u, 0xF508u, 0xE660u,
> + 0x49D6u, 0x5ABEu, 0x6F06u, 0x7C6Eu, 0x0476u, 0x171Eu, 0x22A6u, 0x31CEu,
> + 0xD296u, 0xC1FEu, 0xF446u, 0xE72Eu, 0x9F36u, 0x8C5Eu, 0xB9E6u, 0xAA8Eu,
> + 0xF4E1u, 0xE789u, 0xD231u, 0xC159u, 0xB941u, 0xAA29u, 0x9F91u, 0x8CF9u,
> + 0x6FA1u, 0x7CC9u, 0x4971u, 0x5A19u, 0x2201u, 0x3169u, 0x04D1u, 0x17B9u
> + },
> + {
> + 0x0000u, 0x2BA3u, 0x5746u, 0x7CE5u, 0xAE8Cu, 0x852Fu, 0xF9CAu, 0xD269u,
> + 0xD6AFu, 0xFD0Cu, 0x81E9u, 0xAA4Au, 0x7823u, 0x5380u, 0x2F65u, 0x04C6u,
> + 0x26E9u, 0x0D4Au, 0x71AFu, 0x5A0Cu, 0x8865u, 0xA3C6u, 0xDF23u, 0xF480u,
> + 0xF046u, 0xDBE5u, 0xA700u, 0x8CA3u, 0x5ECAu, 0x7569u, 0x098Cu, 0x222Fu,
> + 0x4DD2u, 0x6671u, 0x1A94u, 0x3137u, 0xE35Eu, 0xC8FDu, 0xB418u, 0x9FBBu,
> + 0x9B7Du, 0xB0DEu, 0xCC3Bu, 0xE798u, 0x35F1u, 0x1E52u, 0x62B7u, 0x4914u,
> + 0x6B3Bu, 0x4098u, 0x3C7Du, 0x17DEu, 0xC5B7u, 0xEE14u, 0x92F1u, 0xB952u,
> + 0xBD94u, 0x9637u, 0xEAD2u, 0xC171u, 0x1318u, 0x38BBu, 0x445Eu, 0x6FFDu,
> + 0x9BA4u, 0xB007u, 0xCCE2u, 0xE741u, 0x3528u, 0x1E8Bu, 0x626Eu, 0x49CDu,
> + 0x4D0Bu, 0x66A8u, 0x1A4Du, 0x31EEu, 0xE387u, 0xC824u, 0xB4C1u, 0x9F62u,
> + 0xBD4Du, 0x96EEu, 0xEA0Bu, 0xC1A8u, 0x13C1u, 0x3862u, 0x4487u, 0x6F24u,
> + 0x6BE2u, 0x4041u, 0x3CA4u, 0x1707u, 0xC56Eu, 0xEECDu, 0x9228u, 0xB98Bu,
> + 0xD676u, 0xFDD5u, 0x8130u, 0xAA93u, 0x78FAu, 0x5359u, 0x2FBCu, 0x041Fu,
> + 0x00D9u, 0x2B7Au, 0x579Fu, 0x7C3Cu, 0xAE55u, 0x85F6u, 0xF913u, 0xD2B0u,
> + 0xF09Fu, 0xDB3Cu, 0xA7D9u, 0x8C7Au, 0x5E13u, 0x75B0u, 0x0955u, 0x22F6u,
> + 0x2630u, 0x0D93u, 0x7176u, 0x5AD5u, 0x88BCu, 0xA31Fu, 0xDFFAu, 0xF459u,
> + 0xBCFFu, 0x975Cu, 0xEBB9u, 0xC01Au, 0x1273u, 0x39D0u, 0x4535u, 0x6E96u,
> + 0x6A50u, 0x41F3u, 0x3D16u, 0x16B5u, 0xC4DCu, 0xEF7Fu, 0x939Au, 0xB839u,
> + 0x9A16u, 0xB1B5u, 0xCD50u, 0xE6F3u, 0x349Au, 0x1F39u, 0x63DCu, 0x487Fu,
> + 0x4CB9u, 0x671Au, 0x1BFFu, 0x305Cu, 0xE235u, 0xC996u, 0xB573u, 0x9ED0u,
> + 0xF12Du, 0xDA8Eu, 0xA66Bu, 0x8DC8u, 0x5FA1u, 0x7402u, 0x08E7u, 0x2344u,
> + 0x2782u, 0x0C21u, 0x70C4u, 0x5B67u, 0x890Eu, 0xA2ADu, 0xDE48u, 0xF5EBu,
> + 0xD7C4u, 0xFC67u, 0x8082u, 0xAB21u, 0x7948u, 0x52EBu, 0x2E0Eu, 0x05ADu,
> + 0x016Bu, 0x2AC8u, 0x562Du, 0x7D8Eu, 0xAFE7u, 0x8444u, 0xF8A1u, 0xD302u,
> + 0x275Bu, 0x0CF8u, 0x701Du, 0x5BBEu, 0x89D7u, 0xA274u, 0xDE91u, 0xF532u,
> + 0xF1F4u, 0xDA57u, 0xA6B2u, 0x8D11u, 0x5F78u, 0x74DBu, 0x083Eu, 0x239Du,
> + 0x01B2u, 0x2A11u, 0x56F4u, 0x7D57u, 0xAF3Eu, 0x849Du, 0xF878u, 0xD3DBu,
> + 0xD71Du, 0xFCBEu, 0x805Bu, 0xABF8u, 0x7991u, 0x5232u, 0x2ED7u, 0x0574u,
> + 0x6A89u, 0x412Au, 0x3DCFu, 0x166Cu, 0xC405u, 0xEFA6u, 0x9343u, 0xB8E0u,
> + 0xBC26u, 0x9785u, 0xEB60u, 0xC0C3u, 0x12AAu, 0x3909u, 0x45ECu, 0x6E4Fu,
> + 0x4C60u, 0x67C3u, 0x1B26u, 0x3085u, 0xE2ECu, 0xC94Fu, 0xB5AAu, 0x9E09u,
> + 0x9ACFu, 0xB16Cu, 0xCD89u, 0xE62Au, 0x3443u, 0x1FE0u, 0x6305u, 0x48A6u
> + },
> + {
> + 0x0000u, 0xF249u, 0x6F25u, 0x9D6Cu, 0xDE4Au, 0x2C03u, 0xB16Fu, 0x4326u,
> + 0x3723u, 0xC56Au, 0x5806u, 0xAA4Fu, 0xE969u, 0x1B20u, 0x864Cu, 0x7405u,
> + 0x6E46u, 0x9C0Fu, 0x0163u, 0xF32Au, 0xB00Cu, 0x4245u, 0xDF29u, 0x2D60u,
> + 0x5965u, 0xAB2Cu, 0x3640u, 0xC409u, 0x872Fu, 0x7566u, 0xE80Au, 0x1A43u,
> + 0xDC8Cu, 0x2EC5u, 0xB3A9u, 0x41E0u, 0x02C6u, 0xF08Fu, 0x6DE3u, 0x9FAAu,
> + 0xEBAFu, 0x19E6u, 0x848Au, 0x76C3u, 0x35E5u, 0xC7ACu, 0x5AC0u, 0xA889u,
> + 0xB2CAu, 0x4083u, 0xDDEFu, 0x2FA6u, 0x6C80u, 0x9EC9u, 0x03A5u, 0xF1ECu,
> + 0x85E9u, 0x77A0u, 0xEACCu, 0x1885u, 0x5BA3u, 0xA9EAu, 0x3486u, 0xC6CFu,
> + 0x32AFu, 0xC0E6u, 0x5D8Au, 0xAFC3u, 0xECE5u, 0x1EACu, 0x83C0u, 0x7189u,
> + 0x058Cu, 0xF7C5u, 0x6AA9u, 0x98E0u, 0xDBC6u, 0x298Fu, 0xB4E3u, 0x46AAu,
> + 0x5CE9u, 0xAEA0u, 0x33CCu, 0xC185u, 0x82A3u, 0x70EAu, 0xED86u, 0x1FCFu,
> + 0x6BCAu, 0x9983u, 0x04EFu, 0xF6A6u, 0xB580u, 0x47C9u, 0xDAA5u, 0x28ECu,
> + 0xEE23u, 0x1C6Au, 0x8106u, 0x734Fu, 0x3069u, 0xC220u, 0x5F4Cu, 0xAD05u,
> + 0xD900u, 0x2B49u, 0xB625u, 0x446Cu, 0x074Au, 0xF503u, 0x686Fu, 0x9A26u,
> + 0x8065u, 0x722Cu, 0xEF40u, 0x1D09u, 0x5E2Fu, 0xAC66u, 0x310Au, 0xC343u,
> + 0xB746u, 0x450Fu, 0xD863u, 0x2A2Au, 0x690Cu, 0x9B45u, 0x0629u, 0xF460u,
> + 0x655Eu, 0x9717u, 0x0A7Bu, 0xF832u, 0xBB14u, 0x495Du, 0xD431u, 0x2678u,
> + 0x527Du, 0xA034u, 0x3D58u, 0xCF11u, 0x8C37u, 0x7E7Eu, 0xE312u, 0x115Bu,
> + 0x0B18u, 0xF951u, 0x643Du, 0x9674u, 0xD552u, 0x271Bu, 0xBA77u, 0x483Eu,
> + 0x3C3Bu, 0xCE72u, 0x531Eu, 0xA157u, 0xE271u, 0x1038u, 0x8D54u, 0x7F1Du,
> + 0xB9D2u, 0x4B9Bu, 0xD6F7u, 0x24BEu, 0x6798u, 0x95D1u, 0x08BDu, 0xFAF4u,
> + 0x8EF1u, 0x7CB8u, 0xE1D4u, 0x139Du, 0x50BBu, 0xA2F2u, 0x3F9Eu, 0xCDD7u,
> + 0xD794u, 0x25DDu, 0xB8B1u, 0x4AF8u, 0x09DEu, 0xFB97u, 0x66FBu, 0x94B2u,
> + 0xE0B7u, 0x12FEu, 0x8F92u, 0x7DDBu, 0x3EFDu, 0xCCB4u, 0x51D8u, 0xA391u,
> + 0x57F1u, 0xA5B8u, 0x38D4u, 0xCA9Du, 0x89BBu, 0x7BF2u, 0xE69Eu, 0x14D7u,
> + 0x60D2u, 0x929Bu, 0x0FF7u, 0xFDBEu, 0xBE98u, 0x4CD1u, 0xD1BDu, 0x23F4u,
> + 0x39B7u, 0xCBFEu, 0x5692u, 0xA4DBu, 0xE7FDu, 0x15B4u, 0x88D8u, 0x7A91u,
> + 0x0E94u, 0xFCDDu, 0x61B1u, 0x93F8u, 0xD0DEu, 0x2297u, 0xBFFBu, 0x4DB2u,
> + 0x8B7Du, 0x7934u, 0xE458u, 0x1611u, 0x5537u, 0xA77Eu, 0x3A12u, 0xC85Bu,
> + 0xBC5Eu, 0x4E17u, 0xD37Bu, 0x2132u, 0x6214u, 0x905Du, 0x0D31u, 0xFF78u,
> + 0xE53Bu, 0x1772u, 0x8A1Eu, 0x7857u, 0x3B71u, 0xC938u, 0x5454u, 0xA61Du,
> + 0xD218u, 0x2051u, 0xBD3Du, 0x4F74u, 0x0C52u, 0xFE1Bu, 0x6377u, 0x913Eu
> + },
> + {
> + 0x0000u, 0xCABCu, 0x1ECFu, 0xD473u, 0x3D9Eu, 0xF722u, 0x2351u, 0xE9EDu,
> + 0x7B3Cu, 0xB180u, 0x65F3u, 0xAF4Fu, 0x46A2u, 0x8C1Eu, 0x586Du, 0x92D1u,
> + 0xF678u, 0x3CC4u, 0xE8B7u, 0x220Bu, 0xCBE6u, 0x015Au, 0xD529u, 0x1F95u,
> + 0x8D44u, 0x47F8u, 0x938Bu, 0x5937u, 0xB0DAu, 0x7A66u, 0xAE15u, 0x64A9u,
> + 0x6747u, 0xADFBu, 0x7988u, 0xB334u, 0x5AD9u, 0x9065u, 0x4416u, 0x8EAAu,
> + 0x1C7Bu, 0xD6C7u, 0x02B4u, 0xC808u, 0x21E5u, 0xEB59u, 0x3F2Au, 0xF596u,
> + 0x913Fu, 0x5B83u, 0x8FF0u, 0x454Cu, 0xACA1u, 0x661Du, 0xB26Eu, 0x78D2u,
> + 0xEA03u, 0x20BFu, 0xF4CCu, 0x3E70u, 0xD79Du, 0x1D21u, 0xC952u, 0x03EEu,
> + 0xCE8Eu, 0x0432u, 0xD041u, 0x1AFDu, 0xF310u, 0x39ACu, 0xEDDFu, 0x2763u,
> + 0xB5B2u, 0x7F0Eu, 0xAB7Du, 0x61C1u, 0x882Cu, 0x4290u, 0x96E3u, 0x5C5Fu,
> + 0x38F6u, 0xF24Au, 0x2639u, 0xEC85u, 0x0568u, 0xCFD4u, 0x1BA7u, 0xD11Bu,
> + 0x43CAu, 0x8976u, 0x5D05u, 0x97B9u, 0x7E54u, 0xB4E8u, 0x609Bu, 0xAA27u,
> + 0xA9C9u, 0x6375u, 0xB706u, 0x7DBAu, 0x9457u, 0x5EEBu, 0x8A98u, 0x4024u,
> + 0xD2F5u, 0x1849u, 0xCC3Au, 0x0686u, 0xEF6Bu, 0x25D7u, 0xF1A4u, 0x3B18u,
> + 0x5FB1u, 0x950Du, 0x417Eu, 0x8BC2u, 0x622Fu, 0xA893u, 0x7CE0u, 0xB65Cu,
> + 0x248Du, 0xEE31u, 0x3A42u, 0xF0FEu, 0x1913u, 0xD3AFu, 0x07DCu, 0xCD60u,
> + 0x16ABu, 0xDC17u, 0x0864u, 0xC2D8u, 0x2B35u, 0xE189u, 0x35FAu, 0xFF46u,
> + 0x6D97u, 0xA72Bu, 0x7358u, 0xB9E4u, 0x5009u, 0x9AB5u, 0x4EC6u, 0x847Au,
> + 0xE0D3u, 0x2A6Fu, 0xFE1Cu, 0x34A0u, 0xDD4Du, 0x17F1u, 0xC382u, 0x093Eu,
> + 0x9BEFu, 0x5153u, 0x8520u, 0x4F9Cu, 0xA671u, 0x6CCDu, 0xB8BEu, 0x7202u,
> + 0x71ECu, 0xBB50u, 0x6F23u, 0xA59Fu, 0x4C72u, 0x86CEu, 0x52BDu, 0x9801u,
> + 0x0AD0u, 0xC06Cu, 0x141Fu, 0xDEA3u, 0x374Eu, 0xFDF2u, 0x2981u, 0xE33Du,
> + 0x8794u, 0x4D28u, 0x995Bu, 0x53E7u, 0xBA0Au, 0x70B6u, 0xA4C5u, 0x6E79u,
> + 0xFCA8u, 0x3614u, 0xE267u, 0x28DBu, 0xC136u, 0x0B8Au, 0xDFF9u, 0x1545u,
> + 0xD825u, 0x1299u, 0xC6EAu, 0x0C56u, 0xE5BBu, 0x2F07u, 0xFB74u, 0x31C8u,
> + 0xA319u, 0x69A5u, 0xBDD6u, 0x776Au, 0x9E87u, 0x543Bu, 0x8048u, 0x4AF4u,
> + 0x2E5Du, 0xE4E1u, 0x3092u, 0xFA2Eu, 0x13C3u, 0xD97Fu, 0x0D0Cu, 0xC7B0u,
> + 0x5561u, 0x9FDDu, 0x4BAEu, 0x8112u, 0x68FFu, 0xA243u, 0x7630u, 0xBC8Cu,
> + 0xBF62u, 0x75DEu, 0xA1ADu, 0x6B11u, 0x82FCu, 0x4840u, 0x9C33u, 0x568Fu,
> + 0xC45Eu, 0x0EE2u, 0xDA91u, 0x102Du, 0xF9C0u, 0x337Cu, 0xE70Fu, 0x2DB3u,
> + 0x491Au, 0x83A6u, 0x57D5u, 0x9D69u, 0x7484u, 0xBE38u, 0x6A4Bu, 0xA0F7u,
> + 0x3226u, 0xF89Au, 0x2CE9u, 0xE655u, 0x0FB8u, 0xC504u, 0x1177u, 0xDBCBu
> + },
> + {
> + 0x0000u, 0x2D56u, 0x5AACu, 0x77FAu, 0xB558u, 0x980Eu, 0xEFF4u, 0xC2A2u,
> + 0xE107u, 0xCC51u, 0xBBABu, 0x96FDu, 0x545Fu, 0x7909u, 0x0EF3u, 0x23A5u,
> + 0x49B9u, 0x64EFu, 0x1315u, 0x3E43u, 0xFCE1u, 0xD1B7u, 0xA64Du, 0x8B1Bu,
> + 0xA8BEu, 0x85E8u, 0xF212u, 0xDF44u, 0x1DE6u, 0x30B0u, 0x474Au, 0x6A1Cu,
> + 0x9372u, 0xBE24u, 0xC9DEu, 0xE488u, 0x262Au, 0x0B7Cu, 0x7C86u, 0x51D0u,
> + 0x7275u, 0x5F23u, 0x28D9u, 0x058Fu, 0xC72Du, 0xEA7Bu, 0x9D81u, 0xB0D7u,
> + 0xDACBu, 0xF79Du, 0x8067u, 0xAD31u, 0x6F93u, 0x42C5u, 0x353Fu, 0x1869u,
> + 0x3BCCu, 0x169Au, 0x6160u, 0x4C36u, 0x8E94u, 0xA3C2u, 0xD438u, 0xF96Eu,
> + 0xAD53u, 0x8005u, 0xF7FFu, 0xDAA9u, 0x180Bu, 0x355Du, 0x42A7u, 0x6FF1u,
> + 0x4C54u, 0x6102u, 0x16F8u, 0x3BAEu, 0xF90Cu, 0xD45Au, 0xA3A0u, 0x8EF6u,
> + 0xE4EAu, 0xC9BCu, 0xBE46u, 0x9310u, 0x51B2u, 0x7CE4u, 0x0B1Eu, 0x2648u,
> + 0x05EDu, 0x28BBu, 0x5F41u, 0x7217u, 0xB0B5u, 0x9DE3u, 0xEA19u, 0xC74Fu,
> + 0x3E21u, 0x1377u, 0x648Du, 0x49DBu, 0x8B79u, 0xA62Fu, 0xD1D5u, 0xFC83u,
> + 0xDF26u, 0xF270u, 0x858Au, 0xA8DCu, 0x6A7Eu, 0x4728u, 0x30D2u, 0x1D84u,
> + 0x7798u, 0x5ACEu, 0x2D34u, 0x0062u, 0xC2C0u, 0xEF96u, 0x986Cu, 0xB53Au,
> + 0x969Fu, 0xBBC9u, 0xCC33u, 0xE165u, 0x23C7u, 0x0E91u, 0x796Bu, 0x543Du,
> + 0xD111u, 0xFC47u, 0x8BBDu, 0xA6EBu, 0x6449u, 0x491Fu, 0x3EE5u, 0x13B3u,
> + 0x3016u, 0x1D40u, 0x6ABAu, 0x47ECu, 0x854Eu, 0xA818u, 0xDFE2u, 0xF2B4u,
> + 0x98A8u, 0xB5FEu, 0xC204u, 0xEF52u, 0x2DF0u, 0x00A6u, 0x775Cu, 0x5A0Au,
> + 0x79AFu, 0x54F9u, 0x2303u, 0x0E55u, 0xCCF7u, 0xE1A1u, 0x965Bu, 0xBB0Du,
> + 0x4263u, 0x6F35u, 0x18CFu, 0x3599u, 0xF73Bu, 0xDA6Du, 0xAD97u, 0x80C1u,
> + 0xA364u, 0x8E32u, 0xF9C8u, 0xD49Eu, 0x163Cu, 0x3B6Au, 0x4C90u, 0x61C6u,
> + 0x0BDAu, 0x268Cu, 0x5176u, 0x7C20u, 0xBE82u, 0x93D4u, 0xE42Eu, 0xC978u,
> + 0xEADDu, 0xC78Bu, 0xB071u, 0x9D27u, 0x5F85u, 0x72D3u, 0x0529u, 0x287Fu,
> + 0x7C42u, 0x5114u, 0x26EEu, 0x0BB8u, 0xC91Au, 0xE44Cu, 0x93B6u, 0xBEE0u,
> + 0x9D45u, 0xB013u, 0xC7E9u, 0xEABFu, 0x281Du, 0x054Bu, 0x72B1u, 0x5FE7u,
> + 0x35FBu, 0x18ADu, 0x6F57u, 0x4201u, 0x80A3u, 0xADF5u, 0xDA0Fu, 0xF759u,
> + 0xD4FCu, 0xF9AAu, 0x8E50u, 0xA306u, 0x61A4u, 0x4CF2u, 0x3B08u, 0x165Eu,
> + 0xEF30u, 0xC266u, 0xB59Cu, 0x98CAu, 0x5A68u, 0x773Eu, 0x00C4u, 0x2D92u,
> + 0x0E37u, 0x2361u, 0x549Bu, 0x79CDu, 0xBB6Fu, 0x9639u, 0xE1C3u, 0xCC95u,
> + 0xA689u, 0x8BDFu, 0xFC25u, 0xD173u, 0x13D1u, 0x3E87u, 0x497Du, 0x642Bu,
> + 0x478Eu, 0x6AD8u, 0x1D22u, 0x3074u, 0xF2D6u, 0xDF80u, 0xA87Au, 0x852Cu
> + },
> + {
> + 0x0000u, 0x2995u, 0x532Au, 0x7ABFu, 0xA654u, 0x8FC1u, 0xF57Eu, 0xDCEBu,
> + 0xC71Fu, 0xEE8Au, 0x9435u, 0xBDA0u, 0x614Bu, 0x48DEu, 0x3261u, 0x1BF4u,
> + 0x0589u, 0x2C1Cu, 0x56A3u, 0x7F36u, 0xA3DDu, 0x8A48u, 0xF0F7u, 0xD962u,
> + 0xC296u, 0xEB03u, 0x91BCu, 0xB829u, 0x64C2u, 0x4D57u, 0x37E8u, 0x1E7Du,
> + 0x0B12u, 0x2287u, 0x5838u, 0x71ADu, 0xAD46u, 0x84D3u, 0xFE6Cu, 0xD7F9u,
> + 0xCC0Du, 0xE598u, 0x9F27u, 0xB6B2u, 0x6A59u, 0x43CCu, 0x3973u, 0x10E6u,
> + 0x0E9Bu, 0x270Eu, 0x5DB1u, 0x7424u, 0xA8CFu, 0x815Au, 0xFBE5u, 0xD270u,
> + 0xC984u, 0xE011u, 0x9AAEu, 0xB33Bu, 0x6FD0u, 0x4645u, 0x3CFAu, 0x156Fu,
> + 0x1624u, 0x3FB1u, 0x450Eu, 0x6C9Bu, 0xB070u, 0x99E5u, 0xE35Au, 0xCACFu,
> + 0xD13Bu, 0xF8AEu, 0x8211u, 0xAB84u, 0x776Fu, 0x5EFAu, 0x2445u, 0x0DD0u,
> + 0x13ADu, 0x3A38u, 0x4087u, 0x6912u, 0xB5F9u, 0x9C6Cu, 0xE6D3u, 0xCF46u,
> + 0xD4B2u, 0xFD27u, 0x8798u, 0xAE0Du, 0x72E6u, 0x5B73u, 0x21CCu, 0x0859u,
> + 0x1D36u, 0x34A3u, 0x4E1Cu, 0x6789u, 0xBB62u, 0x92F7u, 0xE848u, 0xC1DDu,
> + 0xDA29u, 0xF3BCu, 0x8903u, 0xA096u, 0x7C7Du, 0x55E8u, 0x2F57u, 0x06C2u,
> + 0x18BFu, 0x312Au, 0x4B95u, 0x6200u, 0xBEEBu, 0x977Eu, 0xEDC1u, 0xC454u,
> + 0xDFA0u, 0xF635u, 0x8C8Au, 0xA51Fu, 0x79F4u, 0x5061u, 0x2ADEu, 0x034Bu,
> + 0x2C48u, 0x05DDu, 0x7F62u, 0x56F7u, 0x8A1Cu, 0xA389u, 0xD936u, 0xF0A3u,
> + 0xEB57u, 0xC2C2u, 0xB87Du, 0x91E8u, 0x4D03u, 0x6496u, 0x1E29u, 0x37BCu,
> + 0x29C1u, 0x0054u, 0x7AEBu, 0x537Eu, 0x8F95u, 0xA600u, 0xDCBFu, 0xF52Au,
> + 0xEEDEu, 0xC74Bu, 0xBDF4u, 0x9461u, 0x488Au, 0x611Fu, 0x1BA0u, 0x3235u,
> + 0x275Au, 0x0ECFu, 0x7470u, 0x5DE5u, 0x810Eu, 0xA89Bu, 0xD224u, 0xFBB1u,
> + 0xE045u, 0xC9D0u, 0xB36Fu, 0x9AFAu, 0x4611u, 0x6F84u, 0x153Bu, 0x3CAEu,
> + 0x22D3u, 0x0B46u, 0x71F9u, 0x586Cu, 0x8487u, 0xAD12u, 0xD7ADu, 0xFE38u,
> + 0xE5CCu, 0xCC59u, 0xB6E6u, 0x9F73u, 0x4398u, 0x6A0Du, 0x10B2u, 0x3927u,
> + 0x3A6Cu, 0x13F9u, 0x6946u, 0x40D3u, 0x9C38u, 0xB5ADu, 0xCF12u, 0xE687u,
> + 0xFD73u, 0xD4E6u, 0xAE59u, 0x87CCu, 0x5B27u, 0x72B2u, 0x080Du, 0x2198u,
> + 0x3FE5u, 0x1670u, 0x6CCFu, 0x455Au, 0x99B1u, 0xB024u, 0xCA9Bu, 0xE30Eu,
> + 0xF8FAu, 0xD16Fu, 0xABD0u, 0x8245u, 0x5EAEu, 0x773Bu, 0x0D84u, 0x2411u,
> + 0x317Eu, 0x18EBu, 0x6254u, 0x4BC1u, 0x972Au, 0xBEBFu, 0xC400u, 0xED95u,
> + 0xF661u, 0xDFF4u, 0xA54Bu, 0x8CDEu, 0x5035u, 0x79A0u, 0x031Fu, 0x2A8Au,
> + 0x34F7u, 0x1D62u, 0x67DDu, 0x4E48u, 0x92A3u, 0xBB36u, 0xC189u, 0xE81Cu,
> + 0xF3E8u, 0xDA7Du, 0xA0C2u, 0x8957u, 0x55BCu, 0x7C29u, 0x0696u, 0x2F03u
> + },
> + {
> + 0x0000u, 0x5890u, 0xB120u, 0xE9B0u, 0xE9F7u, 0xB167u, 0x58D7u, 0x0047u,
> + 0x5859u, 0x00C9u, 0xE979u, 0xB1E9u, 0xB1AEu, 0xE93Eu, 0x008Eu, 0x581Eu,
> + 0xB0B2u, 0xE822u, 0x0192u, 0x5902u, 0x5945u, 0x01D5u, 0xE865u, 0xB0F5u,
> + 0xE8EBu, 0xB07Bu, 0x59CBu, 0x015Bu, 0x011Cu, 0x598Cu, 0xB03Cu, 0xE8ACu,
> + 0xEAD3u, 0xB243u, 0x5BF3u, 0x0363u, 0x0324u, 0x5BB4u, 0xB204u, 0xEA94u,
> + 0xB28Au, 0xEA1Au, 0x03AAu, 0x5B3Au, 0x5B7Du, 0x03EDu, 0xEA5Du, 0xB2CDu,
> + 0x5A61u, 0x02F1u, 0xEB41u, 0xB3D1u, 0xB396u, 0xEB06u, 0x02B6u, 0x5A26u,
> + 0x0238u, 0x5AA8u, 0xB318u, 0xEB88u, 0xEBCFu, 0xB35Fu, 0x5AEFu, 0x027Fu,
> + 0x5E11u, 0x0681u, 0xEF31u, 0xB7A1u, 0xB7E6u, 0xEF76u, 0x06C6u, 0x5E56u,
> + 0x0648u, 0x5ED8u, 0xB768u, 0xEFF8u, 0xEFBFu, 0xB72Fu, 0x5E9Fu, 0x060Fu,
> + 0xEEA3u, 0xB633u, 0x5F83u, 0x0713u, 0x0754u, 0x5FC4u, 0xB674u, 0xEEE4u,
> + 0xB6FAu, 0xEE6Au, 0x07DAu, 0x5F4Au, 0x5F0Du, 0x079Du, 0xEE2Du, 0xB6BDu,
> + 0xB4C2u, 0xEC52u, 0x05E2u, 0x5D72u, 0x5D35u, 0x05A5u, 0xEC15u, 0xB485u,
> + 0xEC9Bu, 0xB40Bu, 0x5DBBu, 0x052Bu, 0x056Cu, 0x5DFCu, 0xB44Cu, 0xECDCu,
> + 0x0470u, 0x5CE0u, 0xB550u, 0xEDC0u, 0xED87u, 0xB517u, 0x5CA7u, 0x0437u,
> + 0x5C29u, 0x04B9u, 0xED09u, 0xB599u, 0xB5DEu, 0xED4Eu, 0x04FEu, 0x5C6Eu,
> + 0xBC22u, 0xE4B2u, 0x0D02u, 0x5592u, 0x55D5u, 0x0D45u, 0xE4F5u, 0xBC65u,
> + 0xE47Bu, 0xBCEBu, 0x555Bu, 0x0DCBu, 0x0D8Cu, 0x551Cu, 0xBCACu, 0xE43Cu,
> + 0x0C90u, 0x5400u, 0xBDB0u, 0xE520u, 0xE567u, 0xBDF7u, 0x5447u, 0x0CD7u,
> + 0x54C9u, 0x0C59u, 0xE5E9u, 0xBD79u, 0xBD3Eu, 0xE5AEu, 0x0C1Eu, 0x548Eu,
> + 0x56F1u, 0x0E61u, 0xE7D1u, 0xBF41u, 0xBF06u, 0xE796u, 0x0E26u, 0x56B6u,
> + 0x0EA8u, 0x5638u, 0xBF88u, 0xE718u, 0xE75Fu, 0xBFCFu, 0x567Fu, 0x0EEFu,
> + 0xE643u, 0xBED3u, 0x5763u, 0x0FF3u, 0x0FB4u, 0x5724u, 0xBE94u, 0xE604u,
> + 0xBE1Au, 0xE68Au, 0x0F3Au, 0x57AAu, 0x57EDu, 0x0F7Du, 0xE6CDu, 0xBE5Du,
> + 0xE233u, 0xBAA3u, 0x5313u, 0x0B83u, 0x0BC4u, 0x5354u, 0xBAE4u, 0xE274u,
> + 0xBA6Au, 0xE2FAu, 0x0B4Au, 0x53DAu, 0x539Du, 0x0B0Du, 0xE2BDu, 0xBA2Du,
> + 0x5281u, 0x0A11u, 0xE3A1u, 0xBB31u, 0xBB76u, 0xE3E6u, 0x0A56u, 0x52C6u,
> + 0x0AD8u, 0x5248u, 0xBBF8u, 0xE368u, 0xE32Fu, 0xBBBFu, 0x520Fu, 0x0A9Fu,
> + 0x08E0u, 0x5070u, 0xB9C0u, 0xE150u, 0xE117u, 0xB987u, 0x5037u, 0x08A7u,
> + 0x50B9u, 0x0829u, 0xE199u, 0xB909u, 0xB94Eu, 0xE1DEu, 0x086Eu, 0x50FEu,
> + 0xB852u, 0xE0C2u, 0x0972u, 0x51E2u, 0x51A5u, 0x0935u, 0xE085u, 0xB815u,
> + 0xE00Bu, 0xB89Bu, 0x512Bu, 0x09BBu, 0x09FCu, 0x516Cu, 0xB8DCu, 0xE04Cu
> + },
> + {
> + 0x0000u, 0xF3F3u, 0x6C51u, 0x9FA2u, 0xD8A2u, 0x2B51u, 0xB4F3u, 0x4700u,
> + 0x3AF3u, 0xC900u, 0x56A2u, 0xA551u, 0xE251u, 0x11A2u, 0x8E00u, 0x7DF3u,
> + 0x75E6u, 0x8615u, 0x19B7u, 0xEA44u, 0xAD44u, 0x5EB7u, 0xC115u, 0x32E6u,
> + 0x4F15u, 0xBCE6u, 0x2344u, 0xD0B7u, 0x97B7u, 0x6444u, 0xFBE6u, 0x0815u,
> + 0xEBCCu, 0x183Fu, 0x879Du, 0x746Eu, 0x336Eu, 0xC09Du, 0x5F3Fu, 0xACCCu,
> + 0xD13Fu, 0x22CCu, 0xBD6Eu, 0x4E9Du, 0x099Du, 0xFA6Eu, 0x65CCu, 0x963Fu,
> + 0x9E2Au, 0x6DD9u, 0xF27Bu, 0x0188u, 0x4688u, 0xB57Bu, 0x2AD9u, 0xD92Au,
> + 0xA4D9u, 0x572Au, 0xC888u, 0x3B7Bu, 0x7C7Bu, 0x8F88u, 0x102Au, 0xE3D9u,
> + 0x5C2Fu, 0xAFDCu, 0x307Eu, 0xC38Du, 0x848Du, 0x777Eu, 0xE8DCu, 0x1B2Fu,
> + 0x66DCu, 0x952Fu, 0x0A8Du, 0xF97Eu, 0xBE7Eu, 0x4D8Du, 0xD22Fu, 0x21DCu,
> + 0x29C9u, 0xDA3Au, 0x4598u, 0xB66Bu, 0xF16Bu, 0x0298u, 0x9D3Au, 0x6EC9u,
> + 0x133Au, 0xE0C9u, 0x7F6Bu, 0x8C98u, 0xCB98u, 0x386Bu, 0xA7C9u, 0x543Au,
> + 0xB7E3u, 0x4410u, 0xDBB2u, 0x2841u, 0x6F41u, 0x9CB2u, 0x0310u, 0xF0E3u,
> + 0x8D10u, 0x7EE3u, 0xE141u, 0x12B2u, 0x55B2u, 0xA641u, 0x39E3u, 0xCA10u,
> + 0xC205u, 0x31F6u, 0xAE54u, 0x5DA7u, 0x1AA7u, 0xE954u, 0x76F6u, 0x8505u,
> + 0xF8F6u, 0x0B05u, 0x94A7u, 0x6754u, 0x2054u, 0xD3A7u, 0x4C05u, 0xBFF6u,
> + 0xB85Eu, 0x4BADu, 0xD40Fu, 0x27FCu, 0x60FCu, 0x930Fu, 0x0CADu, 0xFF5Eu,
> + 0x82ADu, 0x715Eu, 0xEEFCu, 0x1D0Fu, 0x5A0Fu, 0xA9FCu, 0x365Eu, 0xC5ADu,
> + 0xCDB8u, 0x3E4Bu, 0xA1E9u, 0x521Au, 0x151Au, 0xE6E9u, 0x794Bu, 0x8AB8u,
> + 0xF74Bu, 0x04B8u, 0x9B1Au, 0x68E9u, 0x2FE9u, 0xDC1Au, 0x43B8u, 0xB04Bu,
> + 0x5392u, 0xA061u, 0x3FC3u, 0xCC30u, 0x8B30u, 0x78C3u, 0xE761u, 0x1492u,
> + 0x6961u, 0x9A92u, 0x0530u, 0xF6C3u, 0xB1C3u, 0x4230u, 0xDD92u, 0x2E61u,
> + 0x2674u, 0xD587u, 0x4A25u, 0xB9D6u, 0xFED6u, 0x0D25u, 0x9287u, 0x6174u,
> + 0x1C87u, 0xEF74u, 0x70D6u, 0x8325u, 0xC425u, 0x37D6u, 0xA874u, 0x5B87u,
> + 0xE471u, 0x1782u, 0x8820u, 0x7BD3u, 0x3CD3u, 0xCF20u, 0x5082u, 0xA371u,
> + 0xDE82u, 0x2D71u, 0xB2D3u, 0x4120u, 0x0620u, 0xF5D3u, 0x6A71u, 0x9982u,
> + 0x9197u, 0x6264u, 0xFDC6u, 0x0E35u, 0x4935u, 0xBAC6u, 0x2564u, 0xD697u,
> + 0xAB64u, 0x5897u, 0xC735u, 0x34C6u, 0x73C6u, 0x8035u, 0x1F97u, 0xEC64u,
> + 0x0FBDu, 0xFC4Eu, 0x63ECu, 0x901Fu, 0xD71Fu, 0x24ECu, 0xBB4Eu, 0x48BDu,
> + 0x354Eu, 0xC6BDu, 0x591Fu, 0xAAECu, 0xEDECu, 0x1E1Fu, 0x81BDu, 0x724Eu,
> + 0x7A5Bu, 0x89A8u, 0x160Au, 0xE5F9u, 0xA2F9u, 0x510Au, 0xCEA8u, 0x3D5Bu,
> + 0x40A8u, 0xB35Bu, 0x2CF9u, 0xDF0Au, 0x980Au, 0x6BF9u, 0xF45Bu, 0x07A8u
> + },
> + {
> + 0x0000u, 0xFB0Bu, 0x7DA1u, 0x86AAu, 0xFB42u, 0x0049u, 0x86E3u, 0x7DE8u,
> + 0x7D33u, 0x8638u, 0x0092u, 0xFB99u, 0x8671u, 0x7D7Au, 0xFBD0u, 0x00DBu,
> + 0xFA66u, 0x016Du, 0x87C7u, 0x7CCCu, 0x0124u, 0xFA2Fu, 0x7C85u, 0x878Eu,
> + 0x8755u, 0x7C5Eu, 0xFAF4u, 0x01FFu, 0x7C17u, 0x871Cu, 0x01B6u, 0xFABDu,
> + 0x7F7Bu, 0x8470u, 0x02DAu, 0xF9D1u, 0x8439u, 0x7F32u, 0xF998u, 0x0293u,
> + 0x0248u, 0xF943u, 0x7FE9u, 0x84E2u, 0xF90Au, 0x0201u, 0x84ABu, 0x7FA0u,
> + 0x851Du, 0x7E16u, 0xF8BCu, 0x03B7u, 0x7E5Fu, 0x8554u, 0x03FEu, 0xF8F5u,
> + 0xF82Eu, 0x0325u, 0x858Fu, 0x7E84u, 0x036Cu, 0xF867u, 0x7ECDu, 0x85C6u,
> + 0xFEF6u, 0x05FDu, 0x8357u, 0x785Cu, 0x05B4u, 0xFEBFu, 0x7815u, 0x831Eu,
> + 0x83C5u, 0x78CEu, 0xFE64u, 0x056Fu, 0x7887u, 0x838Cu, 0x0526u, 0xFE2Du,
> + 0x0490u, 0xFF9Bu, 0x7931u, 0x823Au, 0xFFD2u, 0x04D9u, 0x8273u, 0x7978u,
> + 0x79A3u, 0x82A8u, 0x0402u, 0xFF09u, 0x82E1u, 0x79EAu, 0xFF40u, 0x044Bu,
> + 0x818Du, 0x7A86u, 0xFC2Cu, 0x0727u, 0x7ACFu, 0x81C4u, 0x076Eu, 0xFC65u,
> + 0xFCBEu, 0x07B5u, 0x811Fu, 0x7A14u, 0x07FCu, 0xFCF7u, 0x7A5Du, 0x8156u,
> + 0x7BEBu, 0x80E0u, 0x064Au, 0xFD41u, 0x80A9u, 0x7BA2u, 0xFD08u, 0x0603u,
> + 0x06D8u, 0xFDD3u, 0x7B79u, 0x8072u, 0xFD9Au, 0x0691u, 0x803Bu, 0x7B30u,
> + 0x765Bu, 0x8D50u, 0x0BFAu, 0xF0F1u, 0x8D19u, 0x7612u, 0xF0B8u, 0x0BB3u,
> + 0x0B68u, 0xF063u, 0x76C9u, 0x8DC2u, 0xF02Au, 0x0B21u, 0x8D8Bu, 0x7680u,
> + 0x8C3Du, 0x7736u, 0xF19Cu, 0x0A97u, 0x777Fu, 0x8C74u, 0x0ADEu, 0xF1D5u,
> + 0xF10Eu, 0x0A05u, 0x8CAFu, 0x77A4u, 0x0A4Cu, 0xF147u, 0x77EDu, 0x8CE6u,
> + 0x0920u, 0xF22Bu, 0x7481u, 0x8F8Au, 0xF262u, 0x0969u, 0x8FC3u, 0x74C8u,
> + 0x7413u, 0x8F18u, 0x09B2u, 0xF2B9u, 0x8F51u, 0x745Au, 0xF2F0u, 0x09FBu,
> + 0xF346u, 0x084Du, 0x8EE7u, 0x75ECu, 0x0804u, 0xF30Fu, 0x75A5u, 0x8EAEu,
> + 0x8E75u, 0x757Eu, 0xF3D4u, 0x08DFu, 0x7537u, 0x8E3Cu, 0x0896u, 0xF39Du,
> + 0x88ADu, 0x73A6u, 0xF50Cu, 0x0E07u, 0x73EFu, 0x88E4u, 0x0E4Eu, 0xF545u,
> + 0xF59Eu, 0x0E95u, 0x883Fu, 0x7334u, 0x0EDCu, 0xF5D7u, 0x737Du, 0x8876u,
> + 0x72CBu, 0x89C0u, 0x0F6Au, 0xF461u, 0x8989u, 0x7282u, 0xF428u, 0x0F23u,
> + 0x0FF8u, 0xF4F3u, 0x7259u, 0x8952u, 0xF4BAu, 0x0FB1u, 0x891Bu, 0x7210u,
> + 0xF7D6u, 0x0CDDu, 0x8A77u, 0x717Cu, 0x0C94u, 0xF79Fu, 0x7135u, 0x8A3Eu,
> + 0x8AE5u, 0x71EEu, 0xF744u, 0x0C4Fu, 0x71A7u, 0x8AACu, 0x0C06u, 0xF70Du,
> + 0x0DB0u, 0xF6BBu, 0x7011u, 0x8B1Au, 0xF6F2u, 0x0DF9u, 0x8B53u, 0x7058u,
> + 0x7083u, 0x8B88u, 0x0D22u, 0xF629u, 0x8BC1u, 0x70CAu, 0xF660u, 0x0D6Bu
> + },
> + {
> + 0x0000u, 0xECB6u, 0x52DBu, 0xBE6Du, 0xA5B6u, 0x4900u, 0xF76Du, 0x1BDBu,
> + 0xC0DBu, 0x2C6Du, 0x9200u, 0x7EB6u, 0x656Du, 0x89DBu, 0x37B6u, 0xDB00u,
> + 0x0A01u, 0xE6B7u, 0x58DAu, 0xB46Cu, 0xAFB7u, 0x4301u, 0xFD6Cu, 0x11DAu,
> + 0xCADAu, 0x266Cu, 0x9801u, 0x74B7u, 0x6F6Cu, 0x83DAu, 0x3DB7u, 0xD101u,
> + 0x1402u, 0xF8B4u, 0x46D9u, 0xAA6Fu, 0xB1B4u, 0x5D02u, 0xE36Fu, 0x0FD9u,
> + 0xD4D9u, 0x386Fu, 0x8602u, 0x6AB4u, 0x716Fu, 0x9DD9u, 0x23B4u, 0xCF02u,
> + 0x1E03u, 0xF2B5u, 0x4CD8u, 0xA06Eu, 0xBBB5u, 0x5703u, 0xE96Eu, 0x05D8u,
> + 0xDED8u, 0x326Eu, 0x8C03u, 0x60B5u, 0x7B6Eu, 0x97D8u, 0x29B5u, 0xC503u,
> + 0x2804u, 0xC4B2u, 0x7ADFu, 0x9669u, 0x8DB2u, 0x6104u, 0xDF69u, 0x33DFu,
> + 0xE8DFu, 0x0469u, 0xBA04u, 0x56B2u, 0x4D69u, 0xA1DFu, 0x1FB2u, 0xF304u,
> + 0x2205u, 0xCEB3u, 0x70DEu, 0x9C68u, 0x87B3u, 0x6B05u, 0xD568u, 0x39DEu,
> + 0xE2DEu, 0x0E68u, 0xB005u, 0x5CB3u, 0x4768u, 0xABDEu, 0x15B3u, 0xF905u,
> + 0x3C06u, 0xD0B0u, 0x6EDDu, 0x826Bu, 0x99B0u, 0x7506u, 0xCB6Bu, 0x27DDu,
> + 0xFCDDu, 0x106Bu, 0xAE06u, 0x42B0u, 0x596Bu, 0xB5DDu, 0x0BB0u, 0xE706u,
> + 0x3607u, 0xDAB1u, 0x64DCu, 0x886Au, 0x93B1u, 0x7F07u, 0xC16Au, 0x2DDCu,
> + 0xF6DCu, 0x1A6Au, 0xA407u, 0x48B1u, 0x536Au, 0xBFDCu, 0x01B1u, 0xED07u,
> + 0x5008u, 0xBCBEu, 0x02D3u, 0xEE65u, 0xF5BEu, 0x1908u, 0xA765u, 0x4BD3u,
> + 0x90D3u, 0x7C65u, 0xC208u, 0x2EBEu, 0x3565u, 0xD9D3u, 0x67BEu, 0x8B08u,
> + 0x5A09u, 0xB6BFu, 0x08D2u, 0xE464u, 0xFFBFu, 0x1309u, 0xAD64u, 0x41D2u,
> + 0x9AD2u, 0x7664u, 0xC809u, 0x24BFu, 0x3F64u, 0xD3D2u, 0x6DBFu, 0x8109u,
> + 0x440Au, 0xA8BCu, 0x16D1u, 0xFA67u, 0xE1BCu, 0x0D0Au, 0xB367u, 0x5FD1u,
> + 0x84D1u, 0x6867u, 0xD60Au, 0x3ABCu, 0x2167u, 0xCDD1u, 0x73BCu, 0x9F0Au,
> + 0x4E0Bu, 0xA2BDu, 0x1CD0u, 0xF066u, 0xEBBDu, 0x070Bu, 0xB966u, 0x55D0u,
> + 0x8ED0u, 0x6266u, 0xDC0Bu, 0x30BDu, 0x2B66u, 0xC7D0u, 0x79BDu, 0x950Bu,
> + 0x780Cu, 0x94BAu, 0x2AD7u, 0xC661u, 0xDDBAu, 0x310Cu, 0x8F61u, 0x63D7u,
> + 0xB8D7u, 0x5461u, 0xEA0Cu, 0x06BAu, 0x1D61u, 0xF1D7u, 0x4FBAu, 0xA30Cu,
> + 0x720Du, 0x9EBBu, 0x20D6u, 0xCC60u, 0xD7BBu, 0x3B0Du, 0x8560u, 0x69D6u,
> + 0xB2D6u, 0x5E60u, 0xE00Du, 0x0CBBu, 0x1760u, 0xFBD6u, 0x45BBu, 0xA90Du,
> + 0x6C0Eu, 0x80B8u, 0x3ED5u, 0xD263u, 0xC9B8u, 0x250Eu, 0x9B63u, 0x77D5u,
> + 0xACD5u, 0x4063u, 0xFE0Eu, 0x12B8u, 0x0963u, 0xE5D5u, 0x5BB8u, 0xB70Eu,
> + 0x660Fu, 0x8AB9u, 0x34D4u, 0xD862u, 0xC3B9u, 0x2F0Fu, 0x9162u, 0x7DD4u,
> + 0xA6D4u, 0x4A62u, 0xF40Fu, 0x18B9u, 0x0362u, 0xEFD4u, 0x51B9u, 0xBD0Fu
> + },
> + {
> + 0x0000u, 0xA010u, 0xCB97u, 0x6B87u, 0x1C99u, 0xBC89u, 0xD70Eu, 0x771Eu,
> + 0x3932u, 0x9922u, 0xF2A5u, 0x52B5u, 0x25ABu, 0x85BBu, 0xEE3Cu, 0x4E2Cu,
> + 0x7264u, 0xD274u, 0xB9F3u, 0x19E3u, 0x6EFDu, 0xCEEDu, 0xA56Au, 0x057Au,
> + 0x4B56u, 0xEB46u, 0x80C1u, 0x20D1u, 0x57CFu, 0xF7DFu, 0x9C58u, 0x3C48u,
> + 0xE4C8u, 0x44D8u, 0x2F5Fu, 0x8F4Fu, 0xF851u, 0x5841u, 0x33C6u, 0x93D6u,
> + 0xDDFAu, 0x7DEAu, 0x166Du, 0xB67Du, 0xC163u, 0x6173u, 0x0AF4u, 0xAAE4u,
> + 0x96ACu, 0x36BCu, 0x5D3Bu, 0xFD2Bu, 0x8A35u, 0x2A25u, 0x41A2u, 0xE1B2u,
> + 0xAF9Eu, 0x0F8Eu, 0x6409u, 0xC419u, 0xB307u, 0x1317u, 0x7890u, 0xD880u,
> + 0x4227u, 0xE237u, 0x89B0u, 0x29A0u, 0x5EBEu, 0xFEAEu, 0x9529u, 0x3539u,
> + 0x7B15u, 0xDB05u, 0xB082u, 0x1092u, 0x678Cu, 0xC79Cu, 0xAC1Bu, 0x0C0Bu,
> + 0x3043u, 0x9053u, 0xFBD4u, 0x5BC4u, 0x2CDAu, 0x8CCAu, 0xE74Du, 0x475Du,
> + 0x0971u, 0xA961u, 0xC2E6u, 0x62F6u, 0x15E8u, 0xB5F8u, 0xDE7Fu, 0x7E6Fu,
> + 0xA6EFu, 0x06FFu, 0x6D78u, 0xCD68u, 0xBA76u, 0x1A66u, 0x71E1u, 0xD1F1u,
> + 0x9FDDu, 0x3FCDu, 0x544Au, 0xF45Au, 0x8344u, 0x2354u, 0x48D3u, 0xE8C3u,
> + 0xD48Bu, 0x749Bu, 0x1F1Cu, 0xBF0Cu, 0xC812u, 0x6802u, 0x0385u, 0xA395u,
> + 0xEDB9u, 0x4DA9u, 0x262Eu, 0x863Eu, 0xF120u, 0x5130u, 0x3AB7u, 0x9AA7u,
> + 0x844Eu, 0x245Eu, 0x4FD9u, 0xEFC9u, 0x98D7u, 0x38C7u, 0x5340u, 0xF350u,
> + 0xBD7Cu, 0x1D6Cu, 0x76EBu, 0xD6FBu, 0xA1E5u, 0x01F5u, 0x6A72u, 0xCA62u,
> + 0xF62Au, 0x563Au, 0x3DBDu, 0x9DADu, 0xEAB3u, 0x4AA3u, 0x2124u, 0x8134u,
> + 0xCF18u, 0x6F08u, 0x048Fu, 0xA49Fu, 0xD381u, 0x7391u, 0x1816u, 0xB806u,
> + 0x6086u, 0xC096u, 0xAB11u, 0x0B01u, 0x7C1Fu, 0xDC0Fu, 0xB788u, 0x1798u,
> + 0x59B4u, 0xF9A4u, 0x9223u, 0x3233u, 0x452Du, 0xE53Du, 0x8EBAu, 0x2EAAu,
> + 0x12E2u, 0xB2F2u, 0xD975u, 0x7965u, 0x0E7Bu, 0xAE6Bu, 0xC5ECu, 0x65FCu,
> + 0x2BD0u, 0x8BC0u, 0xE047u, 0x4057u, 0x3749u, 0x9759u, 0xFCDEu, 0x5CCEu,
> + 0xC669u, 0x6679u, 0x0DFEu, 0xADEEu, 0xDAF0u, 0x7AE0u, 0x1167u, 0xB177u,
> + 0xFF5Bu, 0x5F4Bu, 0x34CCu, 0x94DCu, 0xE3C2u, 0x43D2u, 0x2855u, 0x8845u,
> + 0xB40Du, 0x141Du, 0x7F9Au, 0xDF8Au, 0xA894u, 0x0884u, 0x6303u, 0xC313u,
> + 0x8D3Fu, 0x2D2Fu, 0x46A8u, 0xE6B8u, 0x91A6u, 0x31B6u, 0x5A31u, 0xFA21u,
> + 0x22A1u, 0x82B1u, 0xE936u, 0x4926u, 0x3E38u, 0x9E28u, 0xF5AFu, 0x55BFu,
> + 0x1B93u, 0xBB83u, 0xD004u, 0x7014u, 0x070Au, 0xA71Au, 0xCC9Du, 0x6C8Du,
> + 0x50C5u, 0xF0D5u, 0x9B52u, 0x3B42u, 0x4C5Cu, 0xEC4Cu, 0x87CBu, 0x27DBu,
> + 0x69F7u, 0xC9E7u, 0xA260u, 0x0270u, 0x756Eu, 0xD57Eu, 0xBEF9u, 0x1EE9u
> + },
> + {
> + 0x0000u, 0x832Bu, 0x8DE1u, 0x0ECAu, 0x9075u, 0x135Eu, 0x1D94u, 0x9EBFu,
> + 0xAB5Du, 0x2876u, 0x26BCu, 0xA597u, 0x3B28u, 0xB803u, 0xB6C9u, 0x35E2u,
> + 0xDD0Du, 0x5E26u, 0x50ECu, 0xD3C7u, 0x4D78u, 0xCE53u, 0xC099u, 0x43B2u,
> + 0x7650u, 0xF57Bu, 0xFBB1u, 0x789Au, 0xE625u, 0x650Eu, 0x6BC4u, 0xE8EFu,
> + 0x31ADu, 0xB286u, 0xBC4Cu, 0x3F67u, 0xA1D8u, 0x22F3u, 0x2C39u, 0xAF12u,
> + 0x9AF0u, 0x19DBu, 0x1711u, 0x943Au, 0x0A85u, 0x89AEu, 0x8764u, 0x044Fu,
> + 0xECA0u, 0x6F8Bu, 0x6141u, 0xE26Au, 0x7CD5u, 0xFFFEu, 0xF134u, 0x721Fu,
> + 0x47FDu, 0xC4D6u, 0xCA1Cu, 0x4937u, 0xD788u, 0x54A3u, 0x5A69u, 0xD942u,
> + 0x635Au, 0xE071u, 0xEEBBu, 0x6D90u, 0xF32Fu, 0x7004u, 0x7ECEu, 0xFDE5u,
> + 0xC807u, 0x4B2Cu, 0x45E6u, 0xC6CDu, 0x5872u, 0xDB59u, 0xD593u, 0x56B8u,
> + 0xBE57u, 0x3D7Cu, 0x33B6u, 0xB09Du, 0x2E22u, 0xAD09u, 0xA3C3u, 0x20E8u,
> + 0x150Au, 0x9621u, 0x98EBu, 0x1BC0u, 0x857Fu, 0x0654u, 0x089Eu, 0x8BB5u,
> + 0x52F7u, 0xD1DCu, 0xDF16u, 0x5C3Du, 0xC282u, 0x41A9u, 0x4F63u, 0xCC48u,
> + 0xF9AAu, 0x7A81u, 0x744Bu, 0xF760u, 0x69DFu, 0xEAF4u, 0xE43Eu, 0x6715u,
> + 0x8FFAu, 0x0CD1u, 0x021Bu, 0x8130u, 0x1F8Fu, 0x9CA4u, 0x926Eu, 0x1145u,
> + 0x24A7u, 0xA78Cu, 0xA946u, 0x2A6Du, 0xB4D2u, 0x37F9u, 0x3933u, 0xBA18u,
> + 0xC6B4u, 0x459Fu, 0x4B55u, 0xC87Eu, 0x56C1u, 0xD5EAu, 0xDB20u, 0x580Bu,
> + 0x6DE9u, 0xEEC2u, 0xE008u, 0x6323u, 0xFD9Cu, 0x7EB7u, 0x707Du, 0xF356u,
> + 0x1BB9u, 0x9892u, 0x9658u, 0x1573u, 0x8BCCu, 0x08E7u, 0x062Du, 0x8506u,
> + 0xB0E4u, 0x33CFu, 0x3D05u, 0xBE2Eu, 0x2091u, 0xA3BAu, 0xAD70u, 0x2E5Bu,
> + 0xF719u, 0x7432u, 0x7AF8u, 0xF9D3u, 0x676Cu, 0xE447u, 0xEA8Du, 0x69A6u,
> + 0x5C44u, 0xDF6Fu, 0xD1A5u, 0x528Eu, 0xCC31u, 0x4F1Au, 0x41D0u, 0xC2FBu,
> + 0x2A14u, 0xA93Fu, 0xA7F5u, 0x24DEu, 0xBA61u, 0x394Au, 0x3780u, 0xB4ABu,
> + 0x8149u, 0x0262u, 0x0CA8u, 0x8F83u, 0x113Cu, 0x9217u, 0x9CDDu, 0x1FF6u,
> + 0xA5EEu, 0x26C5u, 0x280Fu, 0xAB24u, 0x359Bu, 0xB6B0u, 0xB87Au, 0x3B51u,
> + 0x0EB3u, 0x8D98u, 0x8352u, 0x0079u, 0x9EC6u, 0x1DEDu, 0x1327u, 0x900Cu,
> + 0x78E3u, 0xFBC8u, 0xF502u, 0x7629u, 0xE896u, 0x6BBDu, 0x6577u, 0xE65Cu,
> + 0xD3BEu, 0x5095u, 0x5E5Fu, 0xDD74u, 0x43CBu, 0xC0E0u, 0xCE2Au, 0x4D01u,
> + 0x9443u, 0x1768u, 0x19A2u, 0x9A89u, 0x0436u, 0x871Du, 0x89D7u, 0x0AFCu,
> + 0x3F1Eu, 0xBC35u, 0xB2FFu, 0x31D4u, 0xAF6Bu, 0x2C40u, 0x228Au, 0xA1A1u,
> + 0x494Eu, 0xCA65u, 0xC4AFu, 0x4784u, 0xD93Bu, 0x5A10u, 0x54DAu, 0xD7F1u,
> + 0xE213u, 0x6138u, 0x6FF2u, 0xECD9u, 0x7266u, 0xF14Du, 0xFF87u, 0x7CACu
> + }
> };
>
> __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer,
> size_t len) {
> - unsigned int i;
> + const __u8 *i = (const __u8 *)buffer;
> + const __u8 *i_end = i + len;
> + const __u8 *i_last16 = i + (len / 16 * 16);

Why is i_last16 a u8? The len parameter of buffer can be much bigger than u8. Seems like the crc computation will miss the rest of the buffer if the length of buffer is much greater than 256.

Tim

>
> - for (i = 0 ; i < len ; i++)
> - crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
> + for (; i < i_last16; i += 16) {
> + crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^
> + t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^
> + t10_dif_crc_table[13][i[2]] ^
> + t10_dif_crc_table[12][i[3]] ^
> + t10_dif_crc_table[11][i[4]] ^
> + t10_dif_crc_table[10][i[5]] ^
> + t10_dif_crc_table[9][i[6]] ^
> + t10_dif_crc_table[8][i[7]] ^
> + t10_dif_crc_table[7][i[8]] ^
> + t10_dif_crc_table[6][i[9]] ^
> + t10_dif_crc_table[5][i[10]] ^
> + t10_dif_crc_table[4][i[11]] ^
> + t10_dif_crc_table[3][i[12]] ^
> + t10_dif_crc_table[2][i[13]] ^
> + t10_dif_crc_table[1][i[14]] ^
> + t10_dif_crc_table[0][i[15]];
> + }
> +
> + for (; i < i_end; i++)
> + crc = t10_dif_crc_table[0][*i ^ (__u8)(crc >> 8)] ^ (crc << 8);
>
> return crc;
> }
>

2018-08-15 18:31:34

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

Hi!

> This patch provides a performance improvement for the CRC16 calculations done in read/write
> workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write
> workloads (one thread/CPU of IO) we consume 100% of the CPU because of the CRC16 computation
> bottleneck. Today's block devices are considerably faster, but the CRC16 calculation prevents
> folks from utilizing the throughput of such devices. To speed up this calculation and expose
> the block device throughput, we slice the old single byte for loop into a 16 byte for loop,
> with a larger CRC table to match. The result has shown 5x performance improvements on various
> big endian and little endian systems running the 4.18.0 kernel version.

Well, 8K table fits in cache easily, and when running benchmark, table will be cached so it is
a win... Is it also win on non-benchmark workloads and smaller systems?

Pavel

2018-08-16 14:02:41

by Jeffrey Lien

[permalink] [raw]
Subject: RE: [PATCH] Performance Improvement in CRC16 Calculations.

Eric,
We did not test the slice by 4 or 8 tables. I'm not sure of the value of doing that since the slice by 16 will provide the best performance gain. If I'm missing anything here, please let me know.

I'm working on a new version of the patch based on the feedback from others and will also change the pointer variables to start with p and fix the indenting you mentioned below in the new version of the patch.

Thanks

Jeff Lien

-----Original Message-----
From: Eric Biggers [mailto:[email protected]]
Sent: Friday, August 10, 2018 3:16 PM
To: Jeffrey Lien <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; David Darrington <[email protected]>; Jeff Furlong <[email protected]>
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Fri, Aug 10, 2018 at 02:12:11PM -0500, Jeff Lien wrote:
> This patch provides a performance improvement for the CRC16
> calculations done in read/write workloads using the T10 Type 1/2/3
> guard field. For example, today with sequential write workloads (one
> thread/CPU of IO) we consume 100% of the CPU because of the CRC16
> computation bottleneck. Today's block devices are considerably
> faster, but the CRC16 calculation prevents folks from utilizing the
> throughput of such devices. To speed up this calculation and expose
> the block device throughput, we slice the old single byte for loop into a 16 byte for loop, with a larger CRC table to match. The result has shown 5x performance improvements on various big endian and little endian systems running the 4.18.0 kernel version.
>
> FIO Sequential Write, 64K Block Size, Queue Depth 64
> BE Base Kernel: bw=201.5 MiB/s
> BE Modified CRC Calc: bw=968.1 MiB/s
> 4.80x performance improvement
>
> LE Base Kernel: bw=357 MiB/s
> LE Modified CRC Calc: bw=1964 MiB/s
> 5.51x performance improvement
>
> FIO Sequential Read, 64K Block Size, Queue Depth 64
> BE Base Kernel: bw=611.2 MiB/s
> BE Modified CRC calc: bw=684.9 MiB/s
> 1.12x performance improvement
>
> LE Base Kernel: bw=797 MiB/s
> LE Modified CRC Calc: bw=2730 MiB/s
> 3.42x performance improvement

Did you also test the slice-by-4 (requires 2048-byte table) and slice-by-8 (requires 4096-byte table) methods? Your proposal is slice-by-16 (requires 8192-byte table); the original was slice-by-1 (requires 512-byte table).

> __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer,
> size_t len) {
> - unsigned int i;
> + const __u8 *i = (const __u8 *)buffer;
> + const __u8 *i_end = i + len;
> + const __u8 *i_last16 = i + (len / 16 * 16);

'i' is normally a loop counter, not a pointer.
Use 'p', 'p_end', and 'p_last16'.

>
> - for (i = 0 ; i < len ; i++)
> - crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
> + for (; i < i_last16; i += 16) {
> + crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^
> + t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^
> + t10_dif_crc_table[13][i[2]] ^
> + t10_dif_crc_table[12][i[3]] ^
> + t10_dif_crc_table[11][i[4]] ^
> + t10_dif_crc_table[10][i[5]] ^
> + t10_dif_crc_table[9][i[6]] ^
> + t10_dif_crc_table[8][i[7]] ^
> + t10_dif_crc_table[7][i[8]] ^
> + t10_dif_crc_table[6][i[9]] ^
> + t10_dif_crc_table[5][i[10]] ^
> + t10_dif_crc_table[4][i[11]] ^
> + t10_dif_crc_table[3][i[12]] ^
> + t10_dif_crc_table[2][i[13]] ^
> + t10_dif_crc_table[1][i[14]] ^
> + t10_dif_crc_table[0][i[15]];
> + }

Please indent this properly.

crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^
t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^
t10_dif_crc_table[13][i[2]] ^
t10_dif_crc_table[12][i[3]] ^
t10_dif_crc_table[11][i[4]] ^
...

- Eric

2018-08-16 14:22:04

by Douglas Gilbert

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

Hi,
Rather than present this formerly as an alternate patch, attached is a
clean-up of my patch which uses the variable size table proposed by
Joe Perches <[email protected]> and is based on the original patch that
started this thread.

Doug Gilbert

On 2018-08-16 10:02 AM, Jeffrey Lien wrote:
> Eric,
> We did not test the slice by 4 or 8 tables. I'm not sure of the value of doing that since the slice by 16 will provide the best performance gain. If I'm missing anything here, please let me know.
>
> I'm working on a new version of the patch based on the feedback from others and will also change the pointer variables to start with p and fix the indenting you mentioned below in the new version of the patch.
>
> Thanks
>
> Jeff Lien
>
> -----Original Message-----
> From: Eric Biggers [mailto:[email protected]]
> Sent: Friday, August 10, 2018 3:16 PM
> To: Jeffrey Lien <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; David Darrington <[email protected]>; Jeff Furlong <[email protected]>
> Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.
>
> On Fri, Aug 10, 2018 at 02:12:11PM -0500, Jeff Lien wrote:
>> This patch provides a performance improvement for the CRC16
>> calculations done in read/write workloads using the T10 Type 1/2/3
>> guard field. For example, today with sequential write workloads (one
>> thread/CPU of IO) we consume 100% of the CPU because of the CRC16
>> computation bottleneck. Today's block devices are considerably
>> faster, but the CRC16 calculation prevents folks from utilizing the
>> throughput of such devices. To speed up this calculation and expose
>> the block device throughput, we slice the old single byte for loop into a 16 byte for loop, with a larger CRC table to match. The result has shown 5x performance improvements on various big endian and little endian systems running the 4.18.0 kernel version.
>>
>> FIO Sequential Write, 64K Block Size, Queue Depth 64
>> BE Base Kernel: bw=201.5 MiB/s
>> BE Modified CRC Calc: bw=968.1 MiB/s
>> 4.80x performance improvement
>>
>> LE Base Kernel: bw=357 MiB/s
>> LE Modified CRC Calc: bw=1964 MiB/s
>> 5.51x performance improvement
>>
>> FIO Sequential Read, 64K Block Size, Queue Depth 64
>> BE Base Kernel: bw=611.2 MiB/s
>> BE Modified CRC calc: bw=684.9 MiB/s
>> 1.12x performance improvement
>>
>> LE Base Kernel: bw=797 MiB/s
>> LE Modified CRC Calc: bw=2730 MiB/s
>> 3.42x performance improvement
>
> Did you also test the slice-by-4 (requires 2048-byte table) and slice-by-8 (requires 4096-byte table) methods? Your proposal is slice-by-16 (requires 8192-byte table); the original was slice-by-1 (requires 512-byte table).
>
>> __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer,
>> size_t len) {
>> - unsigned int i;
>> + const __u8 *i = (const __u8 *)buffer;
>> + const __u8 *i_end = i + len;
>> + const __u8 *i_last16 = i + (len / 16 * 16);
>
> 'i' is normally a loop counter, not a pointer.
> Use 'p', 'p_end', and 'p_last16'.
>
>>
>> - for (i = 0 ; i < len ; i++)
>> - crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
>> + for (; i < i_last16; i += 16) {
>> + crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^
>> + t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^
>> + t10_dif_crc_table[13][i[2]] ^
>> + t10_dif_crc_table[12][i[3]] ^
>> + t10_dif_crc_table[11][i[4]] ^
>> + t10_dif_crc_table[10][i[5]] ^
>> + t10_dif_crc_table[9][i[6]] ^
>> + t10_dif_crc_table[8][i[7]] ^
>> + t10_dif_crc_table[7][i[8]] ^
>> + t10_dif_crc_table[6][i[9]] ^
>> + t10_dif_crc_table[5][i[10]] ^
>> + t10_dif_crc_table[4][i[11]] ^
>> + t10_dif_crc_table[3][i[12]] ^
>> + t10_dif_crc_table[2][i[13]] ^
>> + t10_dif_crc_table[1][i[14]] ^
>> + t10_dif_crc_table[0][i[15]];
>> + }
>
> Please indent this properly.
>
> crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^
> t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^
> t10_dif_crc_table[13][i[2]] ^
> t10_dif_crc_table[12][i[3]] ^
> t10_dif_crc_table[11][i[4]] ^
> ...
>
> - Eric
>


Attachments:
0001-T10-CRC16-function-build-time-sized-table.patch (35.29 kB)

2018-08-16 15:41:13

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

Hi,

Please include your new patch as plain text inside the mail, not as a
MIME attachment. Otherwise it is not downloadable from
https://patchwork.kernel.org/patch/10563093/

Christophe

Le 16/08/2018 à 16:22, Douglas Gilbert a écrit :
> Hi,
> Rather than present this formerly as an alternate patch, attached is a
> clean-up of my patch which uses the variable size table proposed by
> Joe Perches <[email protected]> and is based on the original patch that
> started this thread.
>
> Doug Gilbert
>
> On 2018-08-16 10:02 AM, Jeffrey Lien wrote:
>> Eric,
>> We did not test the slice by 4 or 8 tables.  I'm not sure of  the
>> value of doing that since the slice by 16 will provide the best
>> performance gain.   If I'm missing anything here, please let me know.
>>
>> I'm working on a new version of the patch based on the feedback from
>> others and will also change the pointer variables to start with p and
>> fix the indenting you mentioned below in the new version of the patch.
>>
>> Thanks
>>
>> Jeff Lien
>>
>> -----Original Message-----
>> From: Eric Biggers [mailto:[email protected]]
>> Sent: Friday, August 10, 2018 3:16 PM
>> To: Jeffrey Lien <[email protected]>
>> Cc: [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; David Darrington
>> <[email protected]>; Jeff Furlong <[email protected]>
>> Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.
>>
>> On Fri, Aug 10, 2018 at 02:12:11PM -0500, Jeff Lien wrote:
>>> This patch provides a performance improvement for the CRC16
>>> calculations done in read/write workloads using the T10 Type 1/2/3
>>> guard field.  For example, today with sequential write workloads (one
>>> thread/CPU of IO) we consume 100% of the CPU because of the CRC16
>>> computation bottleneck.  Today's block devices are considerably
>>> faster, but the CRC16 calculation prevents folks from utilizing the
>>> throughput of such devices.  To speed up this calculation and expose
>>> the block device throughput, we slice the old single byte for loop
>>> into a 16 byte for loop, with a larger CRC table to match.  The
>>> result has shown 5x performance improvements on various big endian
>>> and little endian systems running the 4.18.0 kernel version.
>>>
>>> FIO Sequential Write, 64K Block Size, Queue Depth 64
>>> BE Base Kernel:        bw=201.5 MiB/s
>>> BE Modified CRC Calc:  bw=968.1 MiB/s
>>> 4.80x performance improvement
>>>
>>> LE Base Kernel:        bw=357 MiB/s
>>> LE Modified CRC Calc:  bw=1964 MiB/s
>>> 5.51x performance improvement
>>>
>>> FIO Sequential Read, 64K Block Size, Queue Depth 64
>>> BE Base Kernel:        bw=611.2 MiB/s
>>> BE Modified CRC calc:  bw=684.9 MiB/s
>>> 1.12x performance improvement
>>>
>>> LE Base Kernel:        bw=797 MiB/s
>>> LE Modified CRC Calc:  bw=2730 MiB/s
>>> 3.42x performance improvement
>>
>> Did you also test the slice-by-4 (requires 2048-byte table) and
>> slice-by-8 (requires 4096-byte table) methods?  Your proposal is
>> slice-by-16 (requires 8192-byte table); the original was slice-by-1
>> (requires 512-byte table).
>>
>>>   __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer,
>>> size_t len)  {
>>> -    unsigned int i;
>>> +    const __u8 *i = (const __u8 *)buffer;
>>> +    const __u8 *i_end = i + len;
>>> +    const __u8 *i_last16 = i + (len / 16 * 16);
>>
>> 'i' is normally a loop counter, not a pointer.
>> Use 'p', 'p_end', and 'p_last16'.
>>
>>> -    for (i = 0 ; i < len ; i++)
>>> -        crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^
>>> buffer[i]) & 0xff];
>>> +    for (; i < i_last16; i += 16) {
>>> +        crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >>  8)] ^
>>> +        t10_dif_crc_table[14][i[1] ^ (__u8)(crc >>  0)] ^
>>> +        t10_dif_crc_table[13][i[2]] ^
>>> +        t10_dif_crc_table[12][i[3]] ^
>>> +        t10_dif_crc_table[11][i[4]] ^
>>> +        t10_dif_crc_table[10][i[5]] ^
>>> +        t10_dif_crc_table[9][i[6]] ^
>>> +        t10_dif_crc_table[8][i[7]] ^
>>> +        t10_dif_crc_table[7][i[8]] ^
>>> +        t10_dif_crc_table[6][i[9]] ^
>>> +        t10_dif_crc_table[5][i[10]] ^
>>> +        t10_dif_crc_table[4][i[11]] ^
>>> +        t10_dif_crc_table[3][i[12]] ^
>>> +        t10_dif_crc_table[2][i[13]] ^
>>> +        t10_dif_crc_table[1][i[14]] ^
>>> +        t10_dif_crc_table[0][i[15]];
>>> +    }
>>
>> Please indent this properly.
>>
>>         crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >>  8)] ^
>>               t10_dif_crc_table[14][i[1] ^ (__u8)(crc >>  0)] ^
>>               t10_dif_crc_table[13][i[2]] ^
>>               t10_dif_crc_table[12][i[3]] ^
>>               t10_dif_crc_table[11][i[4]] ^
>>               ...
>>
>> - Eric
>>
>

2018-08-16 15:47:25

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

Hi,

Don't be so sure that slice by 16 provides the best performance.

Some time ago, I did the comparison for CRC32 between slice by 4 and
slice by 8 (as included in the kernel), and the result is the following
on an mpc8xx (powerpc/32)

With CONFIG_CRC32_SLICEBY8:
[ 1.109204] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
[ 1.114401] crc32: self tests passed, processed 225944 bytes in 15118910
nsec
[ 1.130655] crc32c: CRC_LE_BITS = 64
[ 1.134235] crc32c: self tests passed, processed 225944 bytes in 4479879
nsec

With CONFIG_CRC32_SLICEBY4:
[ 1.097129] crc32: CRC_LE_BITS = 32, CRC_BE BITS = 32
[ 1.101878] crc32: self tests passed, processed 225944 bytes in 8616242 nsec
[ 1.116298] crc32c: CRC_LE_BITS = 32
[ 1.119607] crc32c: self tests passed, processed 225944 bytes in 3289576
nsec

As you can see, slice by 4 is better than slice by 8 on that CPU.

So I'm sure it is worth doing the test for CRC16 as well.

Christophe

Le 16/08/2018 à 16:02, Jeffrey Lien a écrit :
> Eric,
> We did not test the slice by 4 or 8 tables. I'm not sure of the value of doing that since the slice by 16 will provide the best performance gain. If I'm missing anything here, please let me know.
>
> I'm working on a new version of the patch based on the feedback from others and will also change the pointer variables to start with p and fix the indenting you mentioned below in the new version of the patch.
>
> Thanks
>
> Jeff Lien
>
> -----Original Message-----
> From: Eric Biggers [mailto:[email protected]]
> Sent: Friday, August 10, 2018 3:16 PM
> To: Jeffrey Lien <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; David Darrington <[email protected]>; Jeff Furlong <[email protected]>
> Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.
>
> On Fri, Aug 10, 2018 at 02:12:11PM -0500, Jeff Lien wrote:
>> This patch provides a performance improvement for the CRC16
>> calculations done in read/write workloads using the T10 Type 1/2/3
>> guard field. For example, today with sequential write workloads (one
>> thread/CPU of IO) we consume 100% of the CPU because of the CRC16
>> computation bottleneck. Today's block devices are considerably
>> faster, but the CRC16 calculation prevents folks from utilizing the
>> throughput of such devices. To speed up this calculation and expose
>> the block device throughput, we slice the old single byte for loop into a 16 byte for loop, with a larger CRC table to match. The result has shown 5x performance improvements on various big endian and little endian systems running the 4.18.0 kernel version.
>>
>> FIO Sequential Write, 64K Block Size, Queue Depth 64
>> BE Base Kernel: bw=201.5 MiB/s
>> BE Modified CRC Calc: bw=968.1 MiB/s
>> 4.80x performance improvement
>>
>> LE Base Kernel: bw=357 MiB/s
>> LE Modified CRC Calc: bw=1964 MiB/s
>> 5.51x performance improvement
>>
>> FIO Sequential Read, 64K Block Size, Queue Depth 64
>> BE Base Kernel: bw=611.2 MiB/s
>> BE Modified CRC calc: bw=684.9 MiB/s
>> 1.12x performance improvement
>>
>> LE Base Kernel: bw=797 MiB/s
>> LE Modified CRC Calc: bw=2730 MiB/s
>> 3.42x performance improvement
>
> Did you also test the slice-by-4 (requires 2048-byte table) and slice-by-8 (requires 4096-byte table) methods? Your proposal is slice-by-16 (requires 8192-byte table); the original was slice-by-1 (requires 512-byte table).
>
>> __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer,
>> size_t len) {
>> - unsigned int i;
>> + const __u8 *i = (const __u8 *)buffer;
>> + const __u8 *i_end = i + len;
>> + const __u8 *i_last16 = i + (len / 16 * 16);
>
> 'i' is normally a loop counter, not a pointer.
> Use 'p', 'p_end', and 'p_last16'.
>
>>
>> - for (i = 0 ; i < len ; i++)
>> - crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
>> + for (; i < i_last16; i += 16) {
>> + crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^
>> + t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^
>> + t10_dif_crc_table[13][i[2]] ^
>> + t10_dif_crc_table[12][i[3]] ^
>> + t10_dif_crc_table[11][i[4]] ^
>> + t10_dif_crc_table[10][i[5]] ^
>> + t10_dif_crc_table[9][i[6]] ^
>> + t10_dif_crc_table[8][i[7]] ^
>> + t10_dif_crc_table[7][i[8]] ^
>> + t10_dif_crc_table[6][i[9]] ^
>> + t10_dif_crc_table[5][i[10]] ^
>> + t10_dif_crc_table[4][i[11]] ^
>> + t10_dif_crc_table[3][i[12]] ^
>> + t10_dif_crc_table[2][i[13]] ^
>> + t10_dif_crc_table[1][i[14]] ^
>> + t10_dif_crc_table[0][i[15]];
>> + }
>
> Please indent this properly.
>
> crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >> 8)] ^
> t10_dif_crc_table[14][i[1] ^ (__u8)(crc >> 0)] ^
> t10_dif_crc_table[13][i[2]] ^
> t10_dif_crc_table[12][i[3]] ^
> t10_dif_crc_table[11][i[4]] ^
> ...
>
> - Eric
>

2018-08-16 17:38:22

by Douglas Gilbert

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On 2018-08-16 11:41 AM, Christophe LEROY wrote:
> Hi,
>
> Please include your new patch as plain text inside the mail, not as a MIME
> attachment. Otherwise it is not downloadable from
> https://patchwork.kernel.org/patch/10563093/

It should be downloadable from:
http://sg.danny.cz/sg/p/0001-T10-CRC16-function-build-time-sized-table.patch

With regard to your comment about slice (table ?) size, that is partially
addressed by a kernel build time option shown in the above patch. That
could be taken a bit further with a sysfs knob (where ?) to reduce the
effective table size from that which the kernel is built with. To increase
the size of the table would imply fetching some more heap and having an
algorithm that could generate the extra part of that table required.

Doug Gilbert

> Christophe
>
> Le 16/08/2018 à 16:22, Douglas Gilbert a écrit :
>> Hi,
>> Rather than present this formerly as an alternate patch, attached is a
>> clean-up of my patch which uses the variable size table proposed by
>> Joe Perches <[email protected]> and is based on the original patch that
>> started this thread.
>>
>> Doug Gilbert
>>
>> On 2018-08-16 10:02 AM, Jeffrey Lien wrote:
>>> Eric,
>>> We did not test the slice by 4 or 8 tables.  I'm not sure of  the value of
>>> doing that since the slice by 16 will provide the best performance gain.   If
>>> I'm missing anything here, please let me know.
>>>
>>> I'm working on a new version of the patch based on the feedback from others
>>> and will also change the pointer variables to start with p and fix the
>>> indenting you mentioned below in the new version of the patch.
>>>
>>> Thanks
>>>
>>> Jeff Lien
>>>
>>> -----Original Message-----
>>> From: Eric Biggers [mailto:[email protected]]
>>> Sent: Friday, August 10, 2018 3:16 PM
>>> To: Jeffrey Lien <[email protected]>
>>> Cc: [email protected]; [email protected];
>>> [email protected]; [email protected];
>>> [email protected]; [email protected];
>>> [email protected]; David Darrington <[email protected]>; Jeff
>>> Furlong <[email protected]>
>>> Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.
>>>
>>> On Fri, Aug 10, 2018 at 02:12:11PM -0500, Jeff Lien wrote:
>>>> This patch provides a performance improvement for the CRC16
>>>> calculations done in read/write workloads using the T10 Type 1/2/3
>>>> guard field.  For example, today with sequential write workloads (one
>>>> thread/CPU of IO) we consume 100% of the CPU because of the CRC16
>>>> computation bottleneck.  Today's block devices are considerably
>>>> faster, but the CRC16 calculation prevents folks from utilizing the
>>>> throughput of such devices.  To speed up this calculation and expose
>>>> the block device throughput, we slice the old single byte for loop into a 16
>>>> byte for loop, with a larger CRC table to match.  The result has shown 5x
>>>> performance improvements on various big endian and little endian systems
>>>> running the 4.18.0 kernel version.
>>>>
>>>> FIO Sequential Write, 64K Block Size, Queue Depth 64
>>>> BE Base Kernel:        bw=201.5 MiB/s
>>>> BE Modified CRC Calc:  bw=968.1 MiB/s
>>>> 4.80x performance improvement
>>>>
>>>> LE Base Kernel:        bw=357 MiB/s
>>>> LE Modified CRC Calc:  bw=1964 MiB/s
>>>> 5.51x performance improvement
>>>>
>>>> FIO Sequential Read, 64K Block Size, Queue Depth 64
>>>> BE Base Kernel:        bw=611.2 MiB/s
>>>> BE Modified CRC calc:  bw=684.9 MiB/s
>>>> 1.12x performance improvement
>>>>
>>>> LE Base Kernel:        bw=797 MiB/s
>>>> LE Modified CRC Calc:  bw=2730 MiB/s
>>>> 3.42x performance improvement
>>>
>>> Did you also test the slice-by-4 (requires 2048-byte table) and slice-by-8
>>> (requires 4096-byte table) methods?  Your proposal is slice-by-16 (requires
>>> 8192-byte table); the original was slice-by-1 (requires 512-byte table).
>>>
>>>>   __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer,
>>>> size_t len)  {
>>>> -    unsigned int i;
>>>> +    const __u8 *i = (const __u8 *)buffer;
>>>> +    const __u8 *i_end = i + len;
>>>> +    const __u8 *i_last16 = i + (len / 16 * 16);
>>>
>>> 'i' is normally a loop counter, not a pointer.
>>> Use 'p', 'p_end', and 'p_last16'.
>>>
>>>> -    for (i = 0 ; i < len ; i++)
>>>> -        crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
>>>> +    for (; i < i_last16; i += 16) {
>>>> +        crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >>  8)] ^
>>>> +        t10_dif_crc_table[14][i[1] ^ (__u8)(crc >>  0)] ^
>>>> +        t10_dif_crc_table[13][i[2]] ^
>>>> +        t10_dif_crc_table[12][i[3]] ^
>>>> +        t10_dif_crc_table[11][i[4]] ^
>>>> +        t10_dif_crc_table[10][i[5]] ^
>>>> +        t10_dif_crc_table[9][i[6]] ^
>>>> +        t10_dif_crc_table[8][i[7]] ^
>>>> +        t10_dif_crc_table[7][i[8]] ^
>>>> +        t10_dif_crc_table[6][i[9]] ^
>>>> +        t10_dif_crc_table[5][i[10]] ^
>>>> +        t10_dif_crc_table[4][i[11]] ^
>>>> +        t10_dif_crc_table[3][i[12]] ^
>>>> +        t10_dif_crc_table[2][i[13]] ^
>>>> +        t10_dif_crc_table[1][i[14]] ^
>>>> +        t10_dif_crc_table[0][i[15]];
>>>> +    }
>>>
>>> Please indent this properly.
>>>
>>>         crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >>  8)] ^
>>>               t10_dif_crc_table[14][i[1] ^ (__u8)(crc >>  0)] ^
>>>               t10_dif_crc_table[13][i[2]] ^
>>>               t10_dif_crc_table[12][i[3]] ^
>>>               t10_dif_crc_table[11][i[4]] ^
>>>               ...
>>>
>>> - Eric
>>>
>>
>

2018-08-17 03:20:05

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.


> With regard to your comment about slice (table ?) size, that is
> partially addressed by a kernel build time option shown in the above
> patch. That could be taken a bit further with a sysfs knob (where ?)
> to reduce the effective table size from that which the kernel is built
> with. To increase the size of the table would imply fetching some more
> heap and having an algorithm that could generate the extra part of
> that table required.

I am not a big fan of punting the decision to whoever compiles the
kernel to pick a number between 1 and 11 ("this CRC calculation is one
louder"). I would prefer to find a reasonable compromise between
bandwidth and cache thrashing side effects instead of overwhelming
people with build time choices and runtime tunables.

Almost everyone is running either Tim's PCLMULQDQ version or using IP
checksum for DIX. The software T10 CRC table implementation is mainly
there as a reference. I don't know of any production environments using
the table-based T10 CRC.

I don't have a problem making the code genuinely useful so it can be
leveraged by processors without hardware CRC acceleration capability.
But there needs to be some solid data guiding this decision so I'm
looking forward to see what WDC has in store.

Our results definitely matched Christophe's in that larger slice-by-N
are not always a win. And "faster" isn't automatically "better" from an
application performance perspective. With the caveat that our
measurements were done about 10 years ago and I'm sure we've come a long
way with processors and caches since then. So the results should be
interesting...

--
Martin K. Petersen Oracle Linux Engineering

2018-08-22 01:40:34

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.


> These days we obviously use the hardware-accelerated CRC calculation
> so the software table approach mostly serves as a reference
> implementation.

I was puzzled as to why WDC's tests did not seem to use the hardware-
accelerated CRC calculation whereas tests on my end worked fine. Turns
out this is due to an unfortunate side effect of how the crypto
subsystem works.

When crc-t10dif is initialized, the crypto infrastructure will pick the
algorithm with the highest priority currently registered. Both block and
SCSI will cause crc-t10dif to be compiled as a built-in so this
selection happens very early.

If crct10dif-pclmul is compiled as a module it will not be available at
the time the T10 CRC library is initialized. And thus the block layer
integrity code will be stuck with the sluggish table CRC. The workaround
is to build with CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y.

However, it seems like a bit of a deficiency in crypto that there is no
way to upgrade existing transformations if higher priority algorithms
become available. btrfs and a few others work around this issue by not
using the generic lib/ CRC functions (which defeats the purpose of
having these in the first place). Instead they are registering their own
transformation at a later time where any accelerator modules are more
likely to be loaded.

Anyway. Just a heads up to people that wonder why the table algorithm is
being exercised despite their hardware supporting CRC acceleration.

--
Martin K. Petersen Oracle Linux Engineering

2018-08-22 06:20:16

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Tue, Aug 21, 2018 at 09:40:34PM -0400, Martin K. Petersen wrote:
> When crc-t10dif is initialized, the crypto infrastructure will pick the
> algorithm with the highest priority currently registered. Both block and
> SCSI will cause crc-t10dif to be compiled as a built-in so this
> selection happens very early.

Ouch. This might actually happen in a lot of other users of the crypto
functionality as well.

> However, it seems like a bit of a deficiency in crypto that there is no
> way to upgrade existing transformations if higher priority algorithms
> become available. btrfs and a few others work around this issue by not
> using the generic lib/ CRC functions (which defeats the purpose of
> having these in the first place). Instead they are registering their own
> transformation at a later time where any accelerator modules are more
> likely to be loaded.

If we can't fix this in crypto (which doesn't seem that easy), we
should at least clearly document the issue somewhere, and fix this in
the t10pi code by initializing crct10dif_tfm in a lazy fashion only
once the fist block device starts using it.

2018-08-24 15:32:52

by Jeffrey Lien

[permalink] [raw]
Subject: RE: [PATCH] Performance Improvement in CRC16 Calculations.

I rebuilt my 4.18 kernel with CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y as Martin recommended and got even better performance results vs the CRC Slice by 16 changes. Here's a summary of the results

FIO Sequential Write, 64K Block Size, Queue Depth 64
PCLMUL = y Kernel: bw = 2237 MiB/s
Slice by 16 CRC Calc: bw = 1964 MiB/s
Base Kernel: bw = 357 MiB/s

FIO Sequential Read, 64K Block Size, Queue Depth 64
PCLMUL = y Kernel: bw = 3839 MiB/s
Slice by 16 CRC Calc: bw = 2730 MiB/s
Base Kernel: bw = 797 MiB/s

So it seems the CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y provides the best performance. Are there any negative side effect to this config option? If not, does it make sense to recommend all the major distro's change their config options to have CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y as the default option?


Jeff Lien


-----Original Message-----
From: Christoph Hellwig [mailto:[email protected]]
Sent: Wednesday, August 22, 2018 1:20 AM
To: Martin K. Petersen <[email protected]>
Cc: Jeffrey Lien <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; David Darrington <[email protected]>; Jeff Furlong <[email protected]>
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Tue, Aug 21, 2018 at 09:40:34PM -0400, Martin K. Petersen wrote:
> When crc-t10dif is initialized, the crypto infrastructure will pick
> the algorithm with the highest priority currently registered. Both
> block and SCSI will cause crc-t10dif to be compiled as a built-in so
> this selection happens very early.

Ouch. This might actually happen in a lot of other users of the crypto functionality as well.

> However, it seems like a bit of a deficiency in crypto that there is
> no way to upgrade existing transformations if higher priority
> algorithms become available. btrfs and a few others work around this
> issue by not using the generic lib/ CRC functions (which defeats the
> purpose of having these in the first place). Instead they are
> registering their own transformation at a later time where any
> accelerator modules are more likely to be loaded.

If we can't fix this in crypto (which doesn't seem that easy), we should at least clearly document the issue somewhere, and fix this in the t10pi code by initializing crct10dif_tfm in a lazy fashion only once the fist block device starts using it.

2018-08-24 15:39:47

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On 24 August 2018 at 16:32, Jeffrey Lien <[email protected]> wrote:
> I rebuilt my 4.18 kernel with CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y as Martin recommended and got even better performance results vs the CRC Slice by 16 changes. Here's a summary of the results
>
> FIO Sequential Write, 64K Block Size, Queue Depth 64
> PCLMUL = y Kernel: bw = 2237 MiB/s
> Slice by 16 CRC Calc: bw = 1964 MiB/s
> Base Kernel: bw = 357 MiB/s
>
> FIO Sequential Read, 64K Block Size, Queue Depth 64
> PCLMUL = y Kernel: bw = 3839 MiB/s
> Slice by 16 CRC Calc: bw = 2730 MiB/s
> Base Kernel: bw = 797 MiB/s
>
> So it seems the CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y provides the best performance. Are there any negative side effect to this config option? If not, does it make sense to recommend all the major distro's change their config options to have CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y as the default option?
>

I think the way the library version of crc_t10dif() invokes the crypto
API should be revised.

Would it be possible to allocate the crypto transform upon first use
instead of from an initcall? If crc_t10dif() is mostly called from
non-process context, that would not really work, but otherwise, we
could simply defer it (and occasional calls from non-process context
that do occur would use the generic code until the point where another
call from process context allocates the transform)

> -----Original Message-----
> From: Christoph Hellwig [mailto:[email protected]]
> Sent: Wednesday, August 22, 2018 1:20 AM
> To: Martin K. Petersen <[email protected]>
> Cc: Jeffrey Lien <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; David Darrington <[email protected]>; Jeff Furlong <[email protected]>
> Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.
>
> On Tue, Aug 21, 2018 at 09:40:34PM -0400, Martin K. Petersen wrote:
>> When crc-t10dif is initialized, the crypto infrastructure will pick
>> the algorithm with the highest priority currently registered. Both
>> block and SCSI will cause crc-t10dif to be compiled as a built-in so
>> this selection happens very early.
>
> Ouch. This might actually happen in a lot of other users of the crypto functionality as well.
>
>> However, it seems like a bit of a deficiency in crypto that there is
>> no way to upgrade existing transformations if higher priority
>> algorithms become available. btrfs and a few others work around this
>> issue by not using the generic lib/ CRC functions (which defeats the
>> purpose of having these in the first place). Instead they are
>> registering their own transformation at a later time where any
>> accelerator modules are more likely to be loaded.
>
> If we can't fix this in crypto (which doesn't seem that easy), we should at least clearly document the issue somewhere, and fix this in the t10pi code by initializing crct10dif_tfm in a lazy fashion only once the fist block device starts using it.

2018-08-24 16:29:25

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.


Ard,

> Would it be possible to allocate the crypto transform upon first use
> instead of from an initcall? If crc_t10dif() is mostly called from
> non-process context, that would not really work, but otherwise, we
> could simply defer it (and occasional calls from non-process context
> that do occur would use the generic code until the point where another
> call from process context allocates the transform)

The function is always called from user context. However, postponing the
crypto transform registration doesn't solve the common scenario of the
user booting off of a Fibre Channel/SAS/NVMe device with the desired
crct10dif-pclmul.ko module located on the boot drive.

If there is no good way to teach crypto to update existing registrations
when a higher priority transformation becomes available, then we
probably need to explore tweaking dracut to unconditionally load
crct10dif-pclmul (and your ARM equivalent). Looks like there are already
hacks in place in dracut to preload crc32c for btrfs and XFS.

Anyway. Just seems like the kernel is violating the principle of least
surprise here. The kernel should always pick the best available tool for
the job...

--
Martin K. Petersen Oracle Linux Engineering

2018-08-24 16:30:29

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.


Jeffrey,

> So it seems the CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y provides the best
> performance.

Thanks for confirming!

> Are there any negative side effect to this config option?

Other than kernel image size, not really.

--
Martin K. Petersen Oracle Linux Engineering

2018-08-24 17:38:36

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On 24 August 2018 at 17:29, Martin K. Petersen
<[email protected]> wrote:
>
> Ard,
>
>> Would it be possible to allocate the crypto transform upon first use
>> instead of from an initcall? If crc_t10dif() is mostly called from
>> non-process context, that would not really work, but otherwise, we
>> could simply defer it (and occasional calls from non-process context
>> that do occur would use the generic code until the point where another
>> call from process context allocates the transform)
>
> The function is always called from user context. However, postponing the
> crypto transform registration doesn't solve the common scenario of the
> user booting off of a Fibre Channel/SAS/NVMe device with the desired
> crct10dif-pclmul.ko module located on the boot drive.
>
> If there is no good way to teach crypto to update existing registrations
> when a higher priority transformation becomes available, then we
> probably need to explore tweaking dracut to unconditionally load
> crct10dif-pclmul (and your ARM equivalent). Looks like there are already
> hacks in place in dracut to preload crc32c for btrfs and XFS.
>

I'd prefer to handle this without help from userland.

It shouldn't be too difficult to register a module notifier that only
sets a flag (or the static key, even?), and to free and re-allocate
the crc_t10dif transform if the flag is set.


> Anyway. Just seems like the kernel is violating the principle of least
> surprise here. The kernel should always pick the best available tool for
> the job...
>
> --
> Martin K. Petersen Oracle Linux Engineering

2018-08-24 21:54:26

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On 24 August 2018 at 22:46, Martin K. Petersen
<[email protected]> wrote:
>
> Ard,
>
>> I'd prefer to handle this without help from userland.
>>
>> It shouldn't be too difficult to register a module notifier that only
>> sets a flag (or the static key, even?), and to free and re-allocate
>> the crc_t10dif transform if the flag is set.
>
> Something like this proof of concept?
>
> diff --git a/lib/crc-t10dif.c b/lib/crc-t10dif.c
> index 1ad33e555805..87d0e8f0794a 100644
> --- a/lib/crc-t10dif.c
> +++ b/lib/crc-t10dif.c
> @@ -15,10 +15,50 @@
> #include <linux/init.h>
> #include <crypto/hash.h>
> #include <linux/static_key.h>
> +#include <linux/notifier.h>
>
> static struct crypto_shash *crct10dif_tfm;
> static struct static_key crct10dif_fallback __read_mostly;
>
> +static void crc_t10dif_print(void)
> +{
> + if (static_key_false(&crct10dif_fallback))
> + pr_info("CRC T10 DIF calculated using library function\n");
> + else
> + pr_info("CRC T10 DIF calculated using crypto hash %s\n",
> + crypto_tfm_alg_driver_name(crypto_shash_tfm(crct10dif_tfm)));
> +}
> +
> +static int crc_t10dif_rehash(struct notifier_block *self, unsigned long val, void *data)
> +{
> +#ifdef CONFIG_MODULES
> + struct module *mod = data;
> +
> + if (val != MODULE_STATE_LIVE ||
> + strncmp(mod->name, "crct10dif", strlen("crct10dif")))
> + return 0;
> +
> + /* Fall back to library function while we replace the tfm */
> + static_key_slow_inc(&crct10dif_fallback);
> +
> + crypto_free_shash(crct10dif_tfm);
> + crct10dif_tfm = crypto_alloc_shash("crct10dif", 0, 0);
> + if (IS_ERR(crct10dif_tfm)) {
> + crct10dif_tfm = NULL;
> + goto out;
> + }
> +
> + static_key_slow_dec(&crct10dif_fallback);
> +out:
> + crc_t10dif_print();
> + return 0;
> +#endif /* CONFIG_MODULES */
> +}
> +
> +static struct notifier_block crc_t10dif_nb = {
> + .notifier_call = crc_t10dif_rehash,
> +};
> +
> __u16 crc_t10dif_update(__u16 crc, const unsigned char *buffer, size_t len)
> {
> struct {
> @@ -49,16 +90,21 @@ EXPORT_SYMBOL(crc_t10dif);
>
> static int __init crc_t10dif_mod_init(void)
> {
> + register_module_notifier(&crc_t10dif_nb);
> +
> crct10dif_tfm = crypto_alloc_shash("crct10dif", 0, 0);
> if (IS_ERR(crct10dif_tfm)) {
> static_key_slow_inc(&crct10dif_fallback);
> crct10dif_tfm = NULL;
> }
> +
> + crc_t10dif_print();
> return 0;
> }
>
> static void __exit crc_t10dif_mod_fini(void)
> {
> + unregister_module_notifier(&crc_t10dif_nb);
> crypto_free_shash(crct10dif_tfm);
> }
>

This looks like it should work, yes. It does rely on the module name
to start with 'crct10dif' but I guess that is reasonable, and matches
the current state on all architectures.

Anyone care to boot test this? Jeffrey?

2018-08-24 21:46:15

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.


Ard,

> I'd prefer to handle this without help from userland.
>
> It shouldn't be too difficult to register a module notifier that only
> sets a flag (or the static key, even?), and to free and re-allocate
> the crc_t10dif transform if the flag is set.

Something like this proof of concept?

diff --git a/lib/crc-t10dif.c b/lib/crc-t10dif.c
index 1ad33e555805..87d0e8f0794a 100644
--- a/lib/crc-t10dif.c
+++ b/lib/crc-t10dif.c
@@ -15,10 +15,50 @@
#include <linux/init.h>
#include <crypto/hash.h>
#include <linux/static_key.h>
+#include <linux/notifier.h>

static struct crypto_shash *crct10dif_tfm;
static struct static_key crct10dif_fallback __read_mostly;

+static void crc_t10dif_print(void)
+{
+ if (static_key_false(&crct10dif_fallback))
+ pr_info("CRC T10 DIF calculated using library function\n");
+ else
+ pr_info("CRC T10 DIF calculated using crypto hash %s\n",
+ crypto_tfm_alg_driver_name(crypto_shash_tfm(crct10dif_tfm)));
+}
+
+static int crc_t10dif_rehash(struct notifier_block *self, unsigned long val, void *data)
+{
+#ifdef CONFIG_MODULES
+ struct module *mod = data;
+
+ if (val != MODULE_STATE_LIVE ||
+ strncmp(mod->name, "crct10dif", strlen("crct10dif")))
+ return 0;
+
+ /* Fall back to library function while we replace the tfm */
+ static_key_slow_inc(&crct10dif_fallback);
+
+ crypto_free_shash(crct10dif_tfm);
+ crct10dif_tfm = crypto_alloc_shash("crct10dif", 0, 0);
+ if (IS_ERR(crct10dif_tfm)) {
+ crct10dif_tfm = NULL;
+ goto out;
+ }
+
+ static_key_slow_dec(&crct10dif_fallback);
+out:
+ crc_t10dif_print();
+ return 0;
+#endif /* CONFIG_MODULES */
+}
+
+static struct notifier_block crc_t10dif_nb = {
+ .notifier_call = crc_t10dif_rehash,
+};
+
__u16 crc_t10dif_update(__u16 crc, const unsigned char *buffer, size_t len)
{
struct {
@@ -49,16 +90,21 @@ EXPORT_SYMBOL(crc_t10dif);

static int __init crc_t10dif_mod_init(void)
{
+ register_module_notifier(&crc_t10dif_nb);
+
crct10dif_tfm = crypto_alloc_shash("crct10dif", 0, 0);
if (IS_ERR(crct10dif_tfm)) {
static_key_slow_inc(&crct10dif_fallback);
crct10dif_tfm = NULL;
}
+
+ crc_t10dif_print();
return 0;
}

static void __exit crc_t10dif_mod_fini(void)
{
+ unregister_module_notifier(&crc_t10dif_nb);
crypto_free_shash(crct10dif_tfm);
}


2018-08-24 22:12:39

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.


Ard,

> This looks like it should work, yes. It does rely on the module name
> to start with 'crct10dif' but I guess that is reasonable, and matches
> the current state on all architectures.

Yep, I verified the module names on ARM and Power. There really wasn't
much I could key off of other than the name string.

> Anyone care to boot test this? Jeffrey?

Did some quick tests on my end with both scsi_debug and FC hardware. I
verified that performance went up as I loaded crct10dif-pclmul while the
test was running:

[ 23.488059] CRC T10 DIF calculated using crypto hash crct10dif-generic
[ 156.922455] sd 0:0:0:0: [sda] Enabling DIX T10-DIF-TYPE1-CRC protection
[ 221.577731] CRC T10 DIF calculated using crypto hash crct10dif-pclmul

R 63.60 MB/s, W 61.00 MB/s, IOPS 486 | Ops 4460, Rec 0, Err 0
R 61.80 MB/s, W 62.00 MB/s, IOPS 493 | Ops 6926, Rec 0, Err 0
R 62.60 MB/s, W 60.80 MB/s, IOPS 494 | Ops 9396, Rec 0, Err 0
R 59.40 MB/s, W 58.80 MB/s, IOPS 482 | Ops 11804, Rec 0, Err 0
R 151.40 MB/s, W 155.40 MB/s, IOPS 1216 | Ops 17883, Rec 0, Err 0
R 165.00 MB/s, W 166.40 MB/s, IOPS 1327 | Ops 24520, Rec 0, Err 0
R 175.00 MB/s, W 177.40 MB/s, IOPS 1417 | Ops 31604, Rec 0, Err 0
R 185.80 MB/s, W 188.60 MB/s, IOPS 1507 | Ops 39137, Rec 0, Err 0
R 200.60 MB/s, W 203.00 MB/s, IOPS 1629 | Ops 47284, Rec 0, Err 0

Note that in this case the CRC is calculated twice per I/O (block layer
and scsi_debug).

I'll do some more testing over the weekend...

--
Martin K. Petersen Oracle Linux Engineering

2018-08-25 06:12:05

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Fri, Aug 24, 2018 at 05:46:15PM -0400, Martin K. Petersen wrote:
>
> +#ifdef CONFIG_MODULES
> + struct module *mod = data;
> +
> + if (val != MODULE_STATE_LIVE ||
> + strncmp(mod->name, "crct10dif", strlen("crct10dif")))
> + return 0;
> +
> + /* Fall back to library function while we replace the tfm */
> + static_key_slow_inc(&crct10dif_fallback);
> +
> + crypto_free_shash(crct10dif_tfm);
> + crct10dif_tfm = crypto_alloc_shash("crct10dif", 0, 0);
> + if (IS_ERR(crct10dif_tfm)) {
> + crct10dif_tfm = NULL;
> + goto out;
> + }
> +
> + static_key_slow_dec(&crct10dif_fallback);

I don't think this is safe unless you do some kind of locking
which would slow down the data path. The easiest fix would be
to keep the old tfm around forever, or use RCU if RCU read locking
is acceptable to your use-case.

We already have a notifier mechanism in the crypto API for algorithm
registration events. However, it currently only notifies for untested
algorithms. So we need to add an event for tested algorithms and also
export this for use outside of the crypto API.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2018-08-26 02:35:23

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.


Herbert,

> I don't think this is safe unless you do some kind of locking
> which would slow down the data path. The easiest fix would be
> to keep the old tfm around forever, or use RCU if RCU read locking
> is acceptable to your use-case.

You're right. There's a small race there.

Patch series coming...

--
Martin K. Petersen Oracle Linux Engineering

2018-08-26 02:40:03

by Martin K. Petersen

[permalink] [raw]
Subject: [PATCH 1/4] crypto: Introduce notifier for new crypto algorithms

Introduce a facility that can be used to receive a notification
callback when a new algorithm becomes available. This can be used by
existing crypto registrations to trigger a switch from a software-only
algorithm to a hardware-accelerated version.

A new CRYPTO_MSG_ALG_LOADED state is introduced to the existing crypto
notification chain, and the register/unregister functions are exported
so they can be called by subsystems outside of crypto.

Signed-off-by: Martin K. Petersen <[email protected]>
Suggested-by: Herbert Xu <[email protected]>
---
crypto/algapi.c | 7 +++----
crypto/algboss.c | 2 ++
crypto/internal.h | 8 --------
include/crypto/algapi.h | 10 ++++++++++
4 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/crypto/algapi.c b/crypto/algapi.c
index c0755cf4f53f..87abad3cc322 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -361,15 +361,12 @@ static void crypto_wait_for_test(struct crypto_larval *larval)
err = crypto_probing_notify(CRYPTO_MSG_ALG_REGISTER, larval->adult);
if (err != NOTIFY_STOP) {
if (WARN_ON(err != NOTIFY_DONE))
- goto out;
+ return;
crypto_alg_tested(larval->alg.cra_driver_name, 0);
}

err = wait_for_completion_killable(&larval->completion);
WARN_ON(err);
-
-out:
- crypto_larval_kill(&larval->alg);
}

int crypto_register_alg(struct crypto_alg *alg)
@@ -390,6 +387,8 @@ int crypto_register_alg(struct crypto_alg *alg)
return PTR_ERR(larval);

crypto_wait_for_test(larval);
+ crypto_probing_notify(CRYPTO_MSG_ALG_LOADED, larval);
+ crypto_larval_kill(&larval->alg);
return 0;
}
EXPORT_SYMBOL_GPL(crypto_register_alg);
diff --git a/crypto/algboss.c b/crypto/algboss.c
index 5e6df2a087fa..527b44d0af21 100644
--- a/crypto/algboss.c
+++ b/crypto/algboss.c
@@ -274,6 +274,8 @@ static int cryptomgr_notify(struct notifier_block *this, unsigned long msg,
return cryptomgr_schedule_probe(data);
case CRYPTO_MSG_ALG_REGISTER:
return cryptomgr_schedule_test(data);
+ case CRYPTO_MSG_ALG_LOADED:
+ break;
}

return NOTIFY_DONE;
diff --git a/crypto/internal.h b/crypto/internal.h
index 9a3f39939fba..ef769b5e8ad3 100644
--- a/crypto/internal.h
+++ b/crypto/internal.h
@@ -26,12 +26,6 @@
#include <linux/rwsem.h>
#include <linux/slab.h>

-/* Crypto notification events. */
-enum {
- CRYPTO_MSG_ALG_REQUEST,
- CRYPTO_MSG_ALG_REGISTER,
-};
-
struct crypto_instance;
struct crypto_template;

@@ -90,8 +84,6 @@ struct crypto_alg *crypto_find_alg(const char *alg_name,
void *crypto_alloc_tfm(const char *alg_name,
const struct crypto_type *frontend, u32 type, u32 mask);

-int crypto_register_notifier(struct notifier_block *nb);
-int crypto_unregister_notifier(struct notifier_block *nb);
int crypto_probing_notify(unsigned long val, void *v);

unsigned int crypto_alg_extsize(struct crypto_alg *alg);
diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h
index bd5e8ccf1687..807501a4a754 100644
--- a/include/crypto/algapi.h
+++ b/include/crypto/algapi.h
@@ -425,4 +425,14 @@ static inline void crypto_yield(u32 flags)
#endif
}

+int crypto_register_notifier(struct notifier_block *nb);
+int crypto_unregister_notifier(struct notifier_block *nb);
+
+/* Crypto notification events. */
+enum {
+ CRYPTO_MSG_ALG_REQUEST,
+ CRYPTO_MSG_ALG_REGISTER,
+ CRYPTO_MSG_ALG_LOADED,
+};
+
#endif /* _CRYPTO_ALGAPI_H */
--
2.17.1

2018-08-26 02:40:04

by Martin K. Petersen

[permalink] [raw]
Subject: [PATCH 2/4] crc-t10dif: Pick better transform if one becomes available

T10 CRC library is linked into the kernel thanks to block and SCSI. The
crypto accelerators are typically loaded later as modules and are
therefore not available when the T10 CRC library is initialized.

Use the crypto notifier facility to trigger a switch to a better algorithm
if one becomes available after the initial hash has been registered. Use
RCU to protect the original transform while the new one is being set up.

Suggested-by: Ard Biesheuvel <[email protected]
Suggested-by: Herbert Xu <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
---
include/linux/crc-t10dif.h | 1 +
lib/crc-t10dif.c | 46 ++++++++++++++++++++++++++++++++++++--
2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/linux/crc-t10dif.h b/include/linux/crc-t10dif.h
index 1fe0cfcdea30..6bb0c0bf357b 100644
--- a/include/linux/crc-t10dif.h
+++ b/include/linux/crc-t10dif.h
@@ -6,6 +6,7 @@

#define CRC_T10DIF_DIGEST_SIZE 2
#define CRC_T10DIF_BLOCK_SIZE 1
+#define CRC_T10DIF_STRING "crct10dif"

extern __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer,
size_t len);
diff --git a/lib/crc-t10dif.c b/lib/crc-t10dif.c
index 1ad33e555805..72076a902df5 100644
--- a/lib/crc-t10dif.c
+++ b/lib/crc-t10dif.c
@@ -14,10 +14,47 @@
#include <linux/err.h>
#include <linux/init.h>
#include <crypto/hash.h>
+#include <crypto/algapi.h>
#include <linux/static_key.h>
+#include <linux/notifier.h>

-static struct crypto_shash *crct10dif_tfm;
+static struct crypto_shash __rcu *crct10dif_tfm;
static struct static_key crct10dif_fallback __read_mostly;
+DEFINE_SPINLOCK(crc_t10dif_mutex);
+
+static int crc_t10dif_rehash(struct notifier_block *self, unsigned long val, void *data)
+{
+ struct crypto_alg *alg = data;
+ struct crypto_shash *new, *old;
+
+ if (val != CRYPTO_MSG_ALG_LOADED ||
+ static_key_false(&crct10dif_fallback) ||
+ strncmp(alg->cra_name, CRC_T10DIF_STRING, strlen(CRC_T10DIF_STRING)))
+ return 0;
+
+ spin_lock(&crc_t10dif_mutex);
+ old = rcu_dereference_protected(crct10dif_tfm,
+ lockdep_is_held(&crc_t10dif_mutex));
+ if (!old) {
+ spin_unlock(&crc_t10dif_mutex);
+ return 0;
+ }
+ new = crypto_alloc_shash("crct10dif", 0, 0);
+ if (IS_ERR(new)) {
+ spin_unlock(&crc_t10dif_mutex);
+ return 0;
+ }
+ rcu_assign_pointer(crct10dif_tfm, new);
+ spin_unlock(&crc_t10dif_mutex);
+
+ synchronize_rcu();
+ crypto_free_shash(old);
+ return 0;
+}
+
+static struct notifier_block crc_t10dif_nb = {
+ .notifier_call = crc_t10dif_rehash,
+};

__u16 crc_t10dif_update(__u16 crc, const unsigned char *buffer, size_t len)
{
@@ -30,11 +67,14 @@ __u16 crc_t10dif_update(__u16 crc, const unsigned char *buffer, size_t len)
if (static_key_false(&crct10dif_fallback))
return crc_t10dif_generic(crc, buffer, len);

- desc.shash.tfm = crct10dif_tfm;
+ rcu_read_lock();
+ desc.shash.tfm = rcu_dereference(crct10dif_tfm);
desc.shash.flags = 0;
*(__u16 *)desc.ctx = crc;

err = crypto_shash_update(&desc.shash, buffer, len);
+ rcu_read_unlock();
+
BUG_ON(err);

return *(__u16 *)desc.ctx;
@@ -49,6 +89,7 @@ EXPORT_SYMBOL(crc_t10dif);

static int __init crc_t10dif_mod_init(void)
{
+ crypto_register_notifier(&crc_t10dif_nb);
crct10dif_tfm = crypto_alloc_shash("crct10dif", 0, 0);
if (IS_ERR(crct10dif_tfm)) {
static_key_slow_inc(&crct10dif_fallback);
@@ -59,6 +100,7 @@ static int __init crc_t10dif_mod_init(void)

static void __exit crc_t10dif_mod_fini(void)
{
+ crypto_unregister_notifier(&crc_t10dif_nb);
crypto_free_shash(crct10dif_tfm);
}

--
2.17.1

2018-08-26 02:40:05

by Martin K. Petersen

[permalink] [raw]
Subject: [PATCH 3/4] crc-t10dif: Allow current transform to be inspected in sysfs

Add a way to print the currently active CRC algorithm in:

/sys/module/crc_t10dif/parameters/transform

Signed-off-by: Martin K. Petersen <[email protected]>
---
lib/crc-t10dif.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/lib/crc-t10dif.c b/lib/crc-t10dif.c
index 72076a902df5..21c9b35a656f 100644
--- a/lib/crc-t10dif.c
+++ b/lib/crc-t10dif.c
@@ -107,6 +107,17 @@ static void __exit crc_t10dif_mod_fini(void)
module_init(crc_t10dif_mod_init);
module_exit(crc_t10dif_mod_fini);

+static int crc_t10dif_transform_show(char *buffer, const struct kernel_param *kp)
+{
+ if (static_key_false(&crct10dif_fallback))
+ return sprintf(buffer, "fallback\n");
+
+ return sprintf(buffer, "%s\n",
+ crypto_tfm_alg_driver_name(crypto_shash_tfm(crct10dif_tfm)));
+}
+
+module_param_call(transform, NULL, crc_t10dif_transform_show, NULL, 0644);
+
MODULE_DESCRIPTION("T10 DIF CRC calculation");
MODULE_LICENSE("GPL");
MODULE_SOFTDEP("pre: crct10dif");
--
2.17.1

2018-08-26 02:40:06

by Martin K. Petersen

[permalink] [raw]
Subject: [PATCH 4/4] block: Integrity profile init function to trigger module loads

The T10 CRC library function is built into the kernel and therefore
registered early. The hardware-accelerated CRC helpers are typically
loaded as modules and only become available later in the boot
sequence. A separate patch modifies the T10 CRC library to subscribe
to notifications from crypto and permits switching from the
table-based algorithm to a hardware accelerated ditto once the
relevant module is loaded.

However, since the dependency for "crc10dif" is already satisfied,
nothing is going to cause the hardware-accelerated kernel modules to
get loaded. Introduce an init_fn in the integrity profile that can be
called to trigger a load of modules providing the T10 CRC calculation
capability. This function will ony get called when a new integrity
profile is registered during device discovery.

Signed-off-by: Martin K. Petersen <[email protected]>
---
block/blk-integrity.c | 5 +++++
block/t10-pi.c | 10 ++++++++++
include/linux/blkdev.h | 2 ++
3 files changed, 17 insertions(+)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index 6121611e1316..5cacae9a2dc2 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -27,6 +27,7 @@
#include <linux/scatterlist.h>
#include <linux/export.h>
#include <linux/slab.h>
+#include <linux/module.h>

#include "blk.h"

@@ -391,6 +392,7 @@ static blk_status_t blk_integrity_nop_fn(struct blk_integrity_iter *iter)

static const struct blk_integrity_profile nop_profile = {
.name = "nop",
+ .init_fn = NULL,
.generate_fn = blk_integrity_nop_fn,
.verify_fn = blk_integrity_nop_fn,
};
@@ -418,6 +420,9 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
bi->tuple_size = template->tuple_size;
bi->tag_size = template->tag_size;

+ if (bi->profile->init_fn)
+ bi->profile->init_fn();
+
disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
}
EXPORT_SYMBOL(blk_integrity_register);
diff --git a/block/t10-pi.c b/block/t10-pi.c
index a98db384048f..b83278f9163a 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -24,6 +24,7 @@
#include <linux/t10-pi.h>
#include <linux/blkdev.h>
#include <linux/crc-t10dif.h>
+#include <linux/module.h>
#include <net/checksum.h>

typedef __be16 (csum_fn) (void *, unsigned int);
@@ -157,8 +158,14 @@ static blk_status_t t10_pi_type3_verify_ip(struct blk_integrity_iter *iter)
return t10_pi_verify(iter, t10_pi_ip_fn, 3);
}

+static void t10_pi_crc_init(void)
+{
+ request_module_nowait(CRC_T10DIF_STRING);
+}
+
const struct blk_integrity_profile t10_pi_type1_crc = {
.name = "T10-DIF-TYPE1-CRC",
+ .init_fn = t10_pi_crc_init,
.generate_fn = t10_pi_type1_generate_crc,
.verify_fn = t10_pi_type1_verify_crc,
};
@@ -166,6 +173,7 @@ EXPORT_SYMBOL(t10_pi_type1_crc);

const struct blk_integrity_profile t10_pi_type1_ip = {
.name = "T10-DIF-TYPE1-IP",
+ .init_fn = NULL,
.generate_fn = t10_pi_type1_generate_ip,
.verify_fn = t10_pi_type1_verify_ip,
};
@@ -173,6 +181,7 @@ EXPORT_SYMBOL(t10_pi_type1_ip);

const struct blk_integrity_profile t10_pi_type3_crc = {
.name = "T10-DIF-TYPE3-CRC",
+ .init_fn = t10_pi_crc_init,
.generate_fn = t10_pi_type3_generate_crc,
.verify_fn = t10_pi_type3_verify_crc,
};
@@ -180,6 +189,7 @@ EXPORT_SYMBOL(t10_pi_type3_crc);

const struct blk_integrity_profile t10_pi_type3_ip = {
.name = "T10-DIF-TYPE3-IP",
+ .init_fn = NULL,
.generate_fn = t10_pi_type3_generate_ip,
.verify_fn = t10_pi_type3_verify_ip,
};
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 79226ca8f80f..a43c02e4f43d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1806,10 +1806,12 @@ struct blk_integrity_iter {
};

typedef blk_status_t (integrity_processing_fn) (struct blk_integrity_iter *);
+typedef void (integrity_init_fn) (void);

struct blk_integrity_profile {
integrity_processing_fn *generate_fn;
integrity_processing_fn *verify_fn;
+ integrity_init_fn *init_fn;
const char *name;
};

--
2.17.1

2018-08-26 08:22:43

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 4/4] block: Integrity profile init function to trigger module loads

Hi Martin,

On 26 August 2018 at 03:40, Martin K. Petersen
<[email protected]> wrote:
> The T10 CRC library function is built into the kernel and therefore
> registered early. The hardware-accelerated CRC helpers are typically
> loaded as modules and only become available later in the boot
> sequence. A separate patch modifies the T10 CRC library to subscribe
> to notifications from crypto and permits switching from the
> table-based algorithm to a hardware accelerated ditto once the
> relevant module is loaded.
>
> However, since the dependency for "crc10dif" is already satisfied,
> nothing is going to cause the hardware-accelerated kernel modules to
> get loaded.

This is not true. All accelerated implementations based on SIMD
polynomial multiplication are tried to the respective CPU features
bits. This applies to x86, power, ARM and arm64.

E.g., for x86 you have

alias: cpu:type:x86,ven*fam*mod*:feature:*0081*

which will be matched by udev if /sys/devices/system/cpu/modalias
contains feature 0081, and so the modules will be loaded automatically
at boot.



> Introduce an init_fn in the integrity profile that can be
> called to trigger a load of modules providing the T10 CRC calculation
> capability. This function will ony get called when a new integrity
> profile is registered during device discovery.
>
> Signed-off-by: Martin K. Petersen <[email protected]>
> ---
> block/blk-integrity.c | 5 +++++
> block/t10-pi.c | 10 ++++++++++
> include/linux/blkdev.h | 2 ++
> 3 files changed, 17 insertions(+)
>
> diff --git a/block/blk-integrity.c b/block/blk-integrity.c
> index 6121611e1316..5cacae9a2dc2 100644
> --- a/block/blk-integrity.c
> +++ b/block/blk-integrity.c
> @@ -27,6 +27,7 @@
> #include <linux/scatterlist.h>
> #include <linux/export.h>
> #include <linux/slab.h>
> +#include <linux/module.h>
>
> #include "blk.h"
>
> @@ -391,6 +392,7 @@ static blk_status_t blk_integrity_nop_fn(struct blk_integrity_iter *iter)
>
> static const struct blk_integrity_profile nop_profile = {
> .name = "nop",
> + .init_fn = NULL,
> .generate_fn = blk_integrity_nop_fn,
> .verify_fn = blk_integrity_nop_fn,
> };
> @@ -418,6 +420,9 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
> bi->tuple_size = template->tuple_size;
> bi->tag_size = template->tag_size;
>
> + if (bi->profile->init_fn)
> + bi->profile->init_fn();
> +
> disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
> }
> EXPORT_SYMBOL(blk_integrity_register);
> diff --git a/block/t10-pi.c b/block/t10-pi.c
> index a98db384048f..b83278f9163a 100644
> --- a/block/t10-pi.c
> +++ b/block/t10-pi.c
> @@ -24,6 +24,7 @@
> #include <linux/t10-pi.h>
> #include <linux/blkdev.h>
> #include <linux/crc-t10dif.h>
> +#include <linux/module.h>
> #include <net/checksum.h>
>
> typedef __be16 (csum_fn) (void *, unsigned int);
> @@ -157,8 +158,14 @@ static blk_status_t t10_pi_type3_verify_ip(struct blk_integrity_iter *iter)
> return t10_pi_verify(iter, t10_pi_ip_fn, 3);
> }
>
> +static void t10_pi_crc_init(void)
> +{
> + request_module_nowait(CRC_T10DIF_STRING);
> +}
> +
> const struct blk_integrity_profile t10_pi_type1_crc = {
> .name = "T10-DIF-TYPE1-CRC",
> + .init_fn = t10_pi_crc_init,
> .generate_fn = t10_pi_type1_generate_crc,
> .verify_fn = t10_pi_type1_verify_crc,
> };
> @@ -166,6 +173,7 @@ EXPORT_SYMBOL(t10_pi_type1_crc);
>
> const struct blk_integrity_profile t10_pi_type1_ip = {
> .name = "T10-DIF-TYPE1-IP",
> + .init_fn = NULL,
> .generate_fn = t10_pi_type1_generate_ip,
> .verify_fn = t10_pi_type1_verify_ip,
> };
> @@ -173,6 +181,7 @@ EXPORT_SYMBOL(t10_pi_type1_ip);
>
> const struct blk_integrity_profile t10_pi_type3_crc = {
> .name = "T10-DIF-TYPE3-CRC",
> + .init_fn = t10_pi_crc_init,
> .generate_fn = t10_pi_type3_generate_crc,
> .verify_fn = t10_pi_type3_verify_crc,
> };
> @@ -180,6 +189,7 @@ EXPORT_SYMBOL(t10_pi_type3_crc);
>
> const struct blk_integrity_profile t10_pi_type3_ip = {
> .name = "T10-DIF-TYPE3-IP",
> + .init_fn = NULL,
> .generate_fn = t10_pi_type3_generate_ip,
> .verify_fn = t10_pi_type3_verify_ip,
> };
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 79226ca8f80f..a43c02e4f43d 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1806,10 +1806,12 @@ struct blk_integrity_iter {
> };
>
> typedef blk_status_t (integrity_processing_fn) (struct blk_integrity_iter *);
> +typedef void (integrity_init_fn) (void);
>
> struct blk_integrity_profile {
> integrity_processing_fn *generate_fn;
> integrity_processing_fn *verify_fn;
> + integrity_init_fn *init_fn;
> const char *name;
> };
>
> --
> 2.17.1
>

2018-08-26 13:30:39

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH 4/4] block: Integrity profile init function to trigger module loads


Hi Ard,

>> However, since the dependency for "crc10dif" is already satisfied,
>> nothing is going to cause the hardware-accelerated kernel modules to
>> get loaded.
>
> This is not true. All accelerated implementations based on SIMD
> polynomial multiplication are tried to the respective CPU features
> bits. This applies to x86, power, ARM and arm64.
>
> E.g., for x86 you have
>
> alias: cpu:type:x86,ven*fam*mod*:feature:*0081*
>
> which will be matched by udev if /sys/devices/system/cpu/modalias
> contains feature 0081, and so the modules will be loaded automatically
> at boot.

If I can avoid carrying that init callback in the block integrity code
that will definitely make me happy. However, loading crct10dif-pclmul
does not happen automatically for me. crc-t10dif is linked statically
and every user of the CRC goes through that library. So nothing ever
requests the "crct10dif" modalias and no accelerator modules are loaded.

<fresh boot>

# lsmod | grep crc
crc32c_intel 24576 0
crc_ccitt 16384 1 ipv6

# modinfo crc32c_intel | grep cpu:type
alias: cpu:type:x86,ven*fam*mod*:feature:*0094*

# modinfo crct10dif-pclmul | grep cpu:type
alias: cpu:type:x86,ven*fam*mod*:feature:*0081*

# egrep -o "0081|0094" /sys/devices/system/cpu/modalias
0081
0094

# modprobe crct10dif
# lsmod | grep crc
crct10dif_pclmul 16384 1
crc32c_intel 24576 0
crc_ccitt 16384 1 ipv6

It's interesting that crc32c_intel is loaded but libcrc32c is not. That
matches your description of how things should work. But crct10dif-pclmul
isn't loaded and neither is crc32_pclmul:

# modprobe crc32
# lsmod | grep crc
crc32_generic 16384 0
crc32_pclmul 16384 0
crc32c_intel 24576 0
crc_ccitt 16384 1 ipv6

--
Martin K. Petersen Oracle Linux Engineering

2018-08-26 13:44:31

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 4/4] block: Integrity profile init function to trigger module loads

On 26 August 2018 at 15:30, Martin K. Petersen
<[email protected]> wrote:
>
> Hi Ard,
>
>>> However, since the dependency for "crc10dif" is already satisfied,
>>> nothing is going to cause the hardware-accelerated kernel modules to
>>> get loaded.
>>
>> This is not true. All accelerated implementations based on SIMD
>> polynomial multiplication are tried to the respective CPU features
>> bits. This applies to x86, power, ARM and arm64.
>>
>> E.g., for x86 you have
>>
>> alias: cpu:type:x86,ven*fam*mod*:feature:*0081*
>>
>> which will be matched by udev if /sys/devices/system/cpu/modalias
>> contains feature 0081, and so the modules will be loaded automatically
>> at boot.
>
> If I can avoid carrying that init callback in the block integrity code
> that will definitely make me happy. However, loading crct10dif-pclmul
> does not happen automatically for me. crc-t10dif is linked statically
> and every user of the CRC goes through that library. So nothing ever
> requests the "crct10dif" modalias and no accelerator modules are loaded.
>
> <fresh boot>
>
> # lsmod | grep crc
> crc32c_intel 24576 0
> crc_ccitt 16384 1 ipv6
>
> # modinfo crc32c_intel | grep cpu:type
> alias: cpu:type:x86,ven*fam*mod*:feature:*0094*
>
> # modinfo crct10dif-pclmul | grep cpu:type
> alias: cpu:type:x86,ven*fam*mod*:feature:*0081*
>
> # egrep -o "0081|0094" /sys/devices/system/cpu/modalias
> 0081
> 0094
>
> # modprobe crct10dif
> # lsmod | grep crc
> crct10dif_pclmul 16384 1
> crc32c_intel 24576 0
> crc_ccitt 16384 1 ipv6
>
> It's interesting that crc32c_intel is loaded but libcrc32c is not. That
> matches your description of how things should work. But crct10dif-pclmul
> isn't loaded and neither is crc32_pclmul:
>
> # modprobe crc32
> # lsmod | grep crc
> crc32_generic 16384 0
> crc32_pclmul 16384 0
> crc32c_intel 24576 0
> crc_ccitt 16384 1 ipv6
>

That is odd. On my Ubuntu system, both crct10dif_pclmul and
crc32_pclmul get loaded automatically.

2018-08-26 13:48:55

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH 4/4] block: Integrity profile init function to trigger module loads


Ard,

> That is odd. On my Ubuntu system, both crct10dif_pclmul and
> crc32_pclmul get loaded automatically.

Just checked my Fedora box and they are loaded there too. Peculiar. I'll
keep digging...

(crc32c-intel is brought in by dracut on RHEL/OL/Fedora, fwiw)

--
Martin K. Petersen Oracle Linux Engineering

2018-08-27 06:09:19

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 1/4] crypto: Introduce notifier for new crypto algorithms

On Sat, Aug 25, 2018 at 10:40:03PM -0400, Martin K. Petersen wrote:
>
> diff --git a/crypto/algapi.c b/crypto/algapi.c
> index c0755cf4f53f..87abad3cc322 100644
> --- a/crypto/algapi.c
> +++ b/crypto/algapi.c
> @@ -361,15 +361,12 @@ static void crypto_wait_for_test(struct crypto_larval *larval)
> err = crypto_probing_notify(CRYPTO_MSG_ALG_REGISTER, larval->adult);
> if (err != NOTIFY_STOP) {
> if (WARN_ON(err != NOTIFY_DONE))
> - goto out;
> + return;
> crypto_alg_tested(larval->alg.cra_driver_name, 0);
> }
>
> err = wait_for_completion_killable(&larval->completion);
> WARN_ON(err);
> -
> -out:
> - crypto_larval_kill(&larval->alg);
> }
>
> int crypto_register_alg(struct crypto_alg *alg)
> @@ -390,6 +387,8 @@ int crypto_register_alg(struct crypto_alg *alg)
> return PTR_ERR(larval);
>
> crypto_wait_for_test(larval);
> + crypto_probing_notify(CRYPTO_MSG_ALG_LOADED, larval);
> + crypto_larval_kill(&larval->alg);

I see that you have moved the larval_kill call into the caller.
This is a problem because there are two callers to wait_for_test.

So it's probably best to leave it in wait_for_test and add the
notify there.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2018-08-27 06:13:26

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 2/4] crc-t10dif: Pick better transform if one becomes available

On Sat, Aug 25, 2018 at 07:40:04PM -0700, Martin K. Petersen wrote:
>
> +static int crc_t10dif_rehash(struct notifier_block *self, unsigned long val, void *data)
> +{
> + struct crypto_alg *alg = data;
> + struct crypto_shash *new, *old;
> +
> + if (val != CRYPTO_MSG_ALG_LOADED ||
> + static_key_false(&crct10dif_fallback) ||
> + strncmp(alg->cra_name, CRC_T10DIF_STRING, strlen(CRC_T10DIF_STRING)))
> + return 0;
> +
> + spin_lock(&crc_t10dif_mutex);
> + old = rcu_dereference_protected(crct10dif_tfm,
> + lockdep_is_held(&crc_t10dif_mutex));
> + if (!old) {
> + spin_unlock(&crc_t10dif_mutex);
> + return 0;
> + }
> + new = crypto_alloc_shash("crct10dif", 0, 0);

You cannot allocate crypto tfm objects while holding a spin lock
because they need to allocate memory with GFP_KERNEL.

How about using a mutex?

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2018-08-30 14:57:46

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH 1/4] crypto: Introduce notifier for new crypto algorithms


Herbert,

Sorry about the delay. Been on the road with surprisingly spotty
internet.

I addressed your two comments in the patch series that will follow.

--
Martin K. Petersen Oracle Linux Engineering

2018-08-30 15:00:14

by Martin K. Petersen

[permalink] [raw]
Subject: [PATCH v2 1/3] crypto: Introduce notifier for new crypto algorithms

Introduce a facility that can be used to receive a notification
callback when a new algorithm becomes available. This can be used by
existing crypto registrations to trigger a switch from a software-only
algorithm to a hardware-accelerated version.

A new CRYPTO_MSG_ALG_LOADED state is introduced to the existing crypto
notification chain, and the register/unregister functions are exported
so they can be called by subsystems outside of crypto.

Signed-off-by: Martin K. Petersen <[email protected]>
Suggested-by: Herbert Xu <[email protected]>
---
crypto/algapi.c | 2 ++
crypto/algboss.c | 2 ++
crypto/internal.h | 8 --------
include/crypto/algapi.h | 10 ++++++++++
4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/crypto/algapi.c b/crypto/algapi.c
index c0755cf4f53f..33522a147412 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -367,6 +367,8 @@ static void crypto_wait_for_test(struct crypto_larval *larval)

err = wait_for_completion_killable(&larval->completion);
WARN_ON(err);
+ if (!err)
+ crypto_probing_notify(CRYPTO_MSG_ALG_LOADED, larval);

out:
crypto_larval_kill(&larval->alg);
diff --git a/crypto/algboss.c b/crypto/algboss.c
index 5e6df2a087fa..527b44d0af21 100644
--- a/crypto/algboss.c
+++ b/crypto/algboss.c
@@ -274,6 +274,8 @@ static int cryptomgr_notify(struct notifier_block *this, unsigned long msg,
return cryptomgr_schedule_probe(data);
case CRYPTO_MSG_ALG_REGISTER:
return cryptomgr_schedule_test(data);
+ case CRYPTO_MSG_ALG_LOADED:
+ break;
}

return NOTIFY_DONE;
diff --git a/crypto/internal.h b/crypto/internal.h
index 9a3f39939fba..ef769b5e8ad3 100644
--- a/crypto/internal.h
+++ b/crypto/internal.h
@@ -26,12 +26,6 @@
#include <linux/rwsem.h>
#include <linux/slab.h>

-/* Crypto notification events. */
-enum {
- CRYPTO_MSG_ALG_REQUEST,
- CRYPTO_MSG_ALG_REGISTER,
-};
-
struct crypto_instance;
struct crypto_template;

@@ -90,8 +84,6 @@ struct crypto_alg *crypto_find_alg(const char *alg_name,
void *crypto_alloc_tfm(const char *alg_name,
const struct crypto_type *frontend, u32 type, u32 mask);

-int crypto_register_notifier(struct notifier_block *nb);
-int crypto_unregister_notifier(struct notifier_block *nb);
int crypto_probing_notify(unsigned long val, void *v);

unsigned int crypto_alg_extsize(struct crypto_alg *alg);
diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h
index bd5e8ccf1687..807501a4a754 100644
--- a/include/crypto/algapi.h
+++ b/include/crypto/algapi.h
@@ -425,4 +425,14 @@ static inline void crypto_yield(u32 flags)
#endif
}

+int crypto_register_notifier(struct notifier_block *nb);
+int crypto_unregister_notifier(struct notifier_block *nb);
+
+/* Crypto notification events. */
+enum {
+ CRYPTO_MSG_ALG_REQUEST,
+ CRYPTO_MSG_ALG_REGISTER,
+ CRYPTO_MSG_ALG_LOADED,
+};
+
#endif /* _CRYPTO_ALGAPI_H */
--
2.17.1

2018-08-30 15:00:15

by Martin K. Petersen

[permalink] [raw]
Subject: [PATCH v2 2/3] crc-t10dif: Pick better transform if one becomes available

T10 CRC library is linked into the kernel thanks to block and SCSI. The
crypto accelerators are typically loaded later as modules and are
therefore not available when the T10 CRC library is initialized.

Use the crypto notifier facility to trigger a switch to a better algorithm
if one becomes available after the initial hash has been registered. Use
RCU to protect the original transform while the new one is being set up.

Suggested-by: Ard Biesheuvel <[email protected]
Suggested-by: Herbert Xu <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
---
include/linux/crc-t10dif.h | 1 +
lib/crc-t10dif.c | 46 ++++++++++++++++++++++++++++++++++++--
2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/linux/crc-t10dif.h b/include/linux/crc-t10dif.h
index 1fe0cfcdea30..6bb0c0bf357b 100644
--- a/include/linux/crc-t10dif.h
+++ b/include/linux/crc-t10dif.h
@@ -6,6 +6,7 @@

#define CRC_T10DIF_DIGEST_SIZE 2
#define CRC_T10DIF_BLOCK_SIZE 1
+#define CRC_T10DIF_STRING "crct10dif"

extern __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer,
size_t len);
diff --git a/lib/crc-t10dif.c b/lib/crc-t10dif.c
index 1ad33e555805..52f577a3868d 100644
--- a/lib/crc-t10dif.c
+++ b/lib/crc-t10dif.c
@@ -14,10 +14,47 @@
#include <linux/err.h>
#include <linux/init.h>
#include <crypto/hash.h>
+#include <crypto/algapi.h>
#include <linux/static_key.h>
+#include <linux/notifier.h>

-static struct crypto_shash *crct10dif_tfm;
+static struct crypto_shash __rcu *crct10dif_tfm;
static struct static_key crct10dif_fallback __read_mostly;
+DEFINE_MUTEX(crc_t10dif_mutex);
+
+static int crc_t10dif_rehash(struct notifier_block *self, unsigned long val, void *data)
+{
+ struct crypto_alg *alg = data;
+ struct crypto_shash *new, *old;
+
+ if (val != CRYPTO_MSG_ALG_LOADED ||
+ static_key_false(&crct10dif_fallback) ||
+ strncmp(alg->cra_name, CRC_T10DIF_STRING, strlen(CRC_T10DIF_STRING)))
+ return 0;
+
+ mutex_lock(&crc_t10dif_mutex);
+ old = rcu_dereference_protected(crct10dif_tfm,
+ lockdep_is_held(&crc_t10dif_mutex));
+ if (!old) {
+ mutex_unlock(&crc_t10dif_mutex);
+ return 0;
+ }
+ new = crypto_alloc_shash("crct10dif", 0, 0);
+ if (IS_ERR(new)) {
+ mutex_unlock(&crc_t10dif_mutex);
+ return 0;
+ }
+ rcu_assign_pointer(crct10dif_tfm, new);
+ mutex_unlock(&crc_t10dif_mutex);
+
+ synchronize_rcu();
+ crypto_free_shash(old);
+ return 0;
+}
+
+static struct notifier_block crc_t10dif_nb = {
+ .notifier_call = crc_t10dif_rehash,
+};

__u16 crc_t10dif_update(__u16 crc, const unsigned char *buffer, size_t len)
{
@@ -30,11 +67,14 @@ __u16 crc_t10dif_update(__u16 crc, const unsigned char *buffer, size_t len)
if (static_key_false(&crct10dif_fallback))
return crc_t10dif_generic(crc, buffer, len);

- desc.shash.tfm = crct10dif_tfm;
+ rcu_read_lock();
+ desc.shash.tfm = rcu_dereference(crct10dif_tfm);
desc.shash.flags = 0;
*(__u16 *)desc.ctx = crc;

err = crypto_shash_update(&desc.shash, buffer, len);
+ rcu_read_unlock();
+
BUG_ON(err);

return *(__u16 *)desc.ctx;
@@ -49,6 +89,7 @@ EXPORT_SYMBOL(crc_t10dif);

static int __init crc_t10dif_mod_init(void)
{
+ crypto_register_notifier(&crc_t10dif_nb);
crct10dif_tfm = crypto_alloc_shash("crct10dif", 0, 0);
if (IS_ERR(crct10dif_tfm)) {
static_key_slow_inc(&crct10dif_fallback);
@@ -59,6 +100,7 @@ static int __init crc_t10dif_mod_init(void)

static void __exit crc_t10dif_mod_fini(void)
{
+ crypto_unregister_notifier(&crc_t10dif_nb);
crypto_free_shash(crct10dif_tfm);
}

--
2.17.1

2018-08-30 15:00:16

by Martin K. Petersen

[permalink] [raw]
Subject: [PATCH v2 3/3] crc-t10dif: Allow current transform to be inspected in sysfs

Add a way to print the currently active CRC algorithm in:

/sys/module/crc_t10dif/parameters/transform

Signed-off-by: Martin K. Petersen <[email protected]>
---
lib/crc-t10dif.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/lib/crc-t10dif.c b/lib/crc-t10dif.c
index 52f577a3868d..6507fa101eff 100644
--- a/lib/crc-t10dif.c
+++ b/lib/crc-t10dif.c
@@ -107,6 +107,17 @@ static void __exit crc_t10dif_mod_fini(void)
module_init(crc_t10dif_mod_init);
module_exit(crc_t10dif_mod_fini);

+static int crc_t10dif_transform_show(char *buffer, const struct kernel_param *kp)
+{
+ if (static_key_false(&crct10dif_fallback))
+ return sprintf(buffer, "fallback\n");
+
+ return sprintf(buffer, "%s\n",
+ crypto_tfm_alg_driver_name(crypto_shash_tfm(crct10dif_tfm)));
+}
+
+module_param_call(transform, NULL, crc_t10dif_transform_show, NULL, 0644);
+
MODULE_DESCRIPTION("T10 DIF CRC calculation");
MODULE_LICENSE("GPL");
MODULE_SOFTDEP("pre: crct10dif");
--
2.17.1

2018-08-31 17:17:34

by Jeffrey Lien

[permalink] [raw]
Subject: RE: [PATCH v2 1/3] crypto: Introduce notifier for new crypto algorithms

Martin,
I tried out this latest series of patches on my system and the performance is matches what I saw with CONFIG_CRYPTO_CRCT10DIF_PCLMUL=Y, as expected.

In this case, I applied your patches and built with CONFIG_CRYPTO_CRCT10DIF_PCLMUL=m.

Is there anything else I should be checking or verifying?


Jeff Lien

-----Original Message-----
From: Martin K. Petersen [mailto:[email protected]]
Sent: Thursday, August 30, 2018 10:00 AM
To: [email protected]
Cc: Jeffrey Lien <[email protected]>; [email protected]; David Darrington <[email protected]>; [email protected]; Jeff Furlong <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: [PATCH v2 1/3] crypto: Introduce notifier for new crypto algorithms

Introduce a facility that can be used to receive a notification callback when a new algorithm becomes available. This can be used by existing crypto registrations to trigger a switch from a software-only algorithm to a hardware-accelerated version.

A new CRYPTO_MSG_ALG_LOADED state is introduced to the existing crypto notification chain, and the register/unregister functions are exported so they can be called by subsystems outside of crypto.

Signed-off-by: Martin K. Petersen <[email protected]>
Suggested-by: Herbert Xu <[email protected]>
---
crypto/algapi.c | 2 ++
crypto/algboss.c | 2 ++
crypto/internal.h | 8 --------
include/crypto/algapi.h | 10 ++++++++++
4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/crypto/algapi.c b/crypto/algapi.c index c0755cf4f53f..33522a147412 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -367,6 +367,8 @@ static void crypto_wait_for_test(struct crypto_larval *larval)

err = wait_for_completion_killable(&larval->completion);
WARN_ON(err);
+ if (!err)
+ crypto_probing_notify(CRYPTO_MSG_ALG_LOADED, larval);

out:
crypto_larval_kill(&larval->alg);
diff --git a/crypto/algboss.c b/crypto/algboss.c index 5e6df2a087fa..527b44d0af21 100644
--- a/crypto/algboss.c
+++ b/crypto/algboss.c
@@ -274,6 +274,8 @@ static int cryptomgr_notify(struct notifier_block *this, unsigned long msg,
return cryptomgr_schedule_probe(data);
case CRYPTO_MSG_ALG_REGISTER:
return cryptomgr_schedule_test(data);
+ case CRYPTO_MSG_ALG_LOADED:
+ break;
}

return NOTIFY_DONE;
diff --git a/crypto/internal.h b/crypto/internal.h index 9a3f39939fba..ef769b5e8ad3 100644
--- a/crypto/internal.h
+++ b/crypto/internal.h
@@ -26,12 +26,6 @@
#include <linux/rwsem.h>
#include <linux/slab.h>

-/* Crypto notification events. */
-enum {
- CRYPTO_MSG_ALG_REQUEST,
- CRYPTO_MSG_ALG_REGISTER,
-};
-
struct crypto_instance;
struct crypto_template;

@@ -90,8 +84,6 @@ struct crypto_alg *crypto_find_alg(const char *alg_name, void *crypto_alloc_tfm(const char *alg_name,
const struct crypto_type *frontend, u32 type, u32 mask);

-int crypto_register_notifier(struct notifier_block *nb); -int crypto_unregister_notifier(struct notifier_block *nb); int crypto_probing_notify(unsigned long val, void *v);

unsigned int crypto_alg_extsize(struct crypto_alg *alg); diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h index bd5e8ccf1687..807501a4a754 100644
--- a/include/crypto/algapi.h
+++ b/include/crypto/algapi.h
@@ -425,4 +425,14 @@ static inline void crypto_yield(u32 flags) #endif }

+int crypto_register_notifier(struct notifier_block *nb); int
+crypto_unregister_notifier(struct notifier_block *nb);
+
+/* Crypto notification events. */
+enum {
+ CRYPTO_MSG_ALG_REQUEST,
+ CRYPTO_MSG_ALG_REGISTER,
+ CRYPTO_MSG_ALG_LOADED,
+};
+
#endif /* _CRYPTO_ALGAPI_H */
--
2.17.1

2018-09-04 05:21:51

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] crypto: Introduce notifier for new crypto algorithms

On Thu, Aug 30, 2018 at 11:00:14AM -0400, Martin K. Petersen wrote:
> Introduce a facility that can be used to receive a notification
> callback when a new algorithm becomes available. This can be used by
> existing crypto registrations to trigger a switch from a software-only
> algorithm to a hardware-accelerated version.
>
> A new CRYPTO_MSG_ALG_LOADED state is introduced to the existing crypto
> notification chain, and the register/unregister functions are exported
> so they can be called by subsystems outside of crypto.
>
> Signed-off-by: Martin K. Petersen <[email protected]>
> Suggested-by: Herbert Xu <[email protected]>
> ---
> crypto/algapi.c | 2 ++
> crypto/algboss.c | 2 ++
> crypto/internal.h | 8 --------
> include/crypto/algapi.h | 10 ++++++++++
> 4 files changed, 14 insertions(+), 8 deletions(-)

All applied. Thanks.
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2018-09-04 13:30:06

by Torsten Duwe

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] crypto: Introduce notifier for new crypto algorithms

On Thu, Aug 30, 2018 at 11:00:14AM -0400, Martin K. Petersen wrote:
> Introduce a facility that can be used to receive a notification
> callback when a new algorithm becomes available. This can be used by
> existing crypto registrations to trigger a switch from a software-only
> algorithm to a hardware-accelerated version.

While this is apparently fine with the patch set you sent, what about
cases where the crypto context of old and new implementation is not
100% compatible? The switch will still work for sure if there are no
current active users of the algo, but otherwise...? Just a thought.

Torsten