Subject: Re: [PATCH v4 1/6] mtd: nand: vf610_nfc: Freescale NFC for VF610, MPC5125 and others

On 2015-03-25 17:28:24 [+0100], Stefan Agner wrote:
> diff --git a/drivers/mtd/nand/vf610_nfc.c b/drivers/mtd/nand/vf610_nfc.c
> new file mode 100644
> index 0000000..23c1510
> --- /dev/null
> +++ b/drivers/mtd/nand/vf610_nfc.c
> @@ -0,0 +1,686 @@

> +static inline u32 vf610_nfc_read(struct vf610_nfc *nfc, uint reg)
> +{
> + return readl(nfc->regs + reg);
> +}
> +
> +static inline void vf610_nfc_write(struct vf610_nfc *nfc, uint reg, u32 val)
> +{
> + writel(val, nfc->regs + reg);
> +}


> +static void vf610_nfc_send_command(struct vf610_nfc *nfc, u32 cmd_byte1,
> + u32 cmd_code)
> +{
> + void __iomem *reg = nfc->regs + NFC_FLASH_CMD2;
> + u32 tmp;
> +
> + vf610_nfc_clear_status(nfc);
> +
> + tmp = __raw_readl(reg);
> + tmp &= ~(CMD_BYTE1_MASK | CMD_CODE_MASK | BUFNO_MASK);
> + tmp |= cmd_byte1 << CMD_BYTE1_SHIFT;
> + tmp |= cmd_code << CMD_CODE_SHIFT;
> + __raw_writel(tmp, reg);
> +}

Why readl() vs __raw_readl() dito for write?
vf610_nfc_{read|write} is good since for PPC we would need out_be32()
here instead.
It would be nice if you could abstract the __raw_ once as well. And I am
not sure if you need those at all since the former functions should work
here just fine.

Sebastian


2015-06-03 15:07:23

by Stefan Agner

[permalink] [raw]
Subject: Re: [PATCH v4 1/6] mtd: nand: vf610_nfc: Freescale NFC for VF610, MPC5125 and others

On 2015-06-03 15:08, Sebastian Andrzej Siewior wrote:
> On 2015-03-25 17:28:24 [+0100], Stefan Agner wrote:
>> diff --git a/drivers/mtd/nand/vf610_nfc.c b/drivers/mtd/nand/vf610_nfc.c
>> new file mode 100644
>> index 0000000..23c1510
>> --- /dev/null
>> +++ b/drivers/mtd/nand/vf610_nfc.c
>> @@ -0,0 +1,686 @@
> …
>> +static inline u32 vf610_nfc_read(struct vf610_nfc *nfc, uint reg)
>> +{
>> + return readl(nfc->regs + reg);
>> +}
>> +
>> +static inline void vf610_nfc_write(struct vf610_nfc *nfc, uint reg, u32 val)
>> +{
>> + writel(val, nfc->regs + reg);
>> +}
> …
>
>> +static void vf610_nfc_send_command(struct vf610_nfc *nfc, u32 cmd_byte1,
>> + u32 cmd_code)
>> +{
>> + void __iomem *reg = nfc->regs + NFC_FLASH_CMD2;
>> + u32 tmp;
>> +
>> + vf610_nfc_clear_status(nfc);
>> +
>> + tmp = __raw_readl(reg);
>> + tmp &= ~(CMD_BYTE1_MASK | CMD_CODE_MASK | BUFNO_MASK);
>> + tmp |= cmd_byte1 << CMD_BYTE1_SHIFT;
>> + tmp |= cmd_code << CMD_CODE_SHIFT;
>> + __raw_writel(tmp, reg);
>> +}
>
> Why readl() vs __raw_readl() dito for write?
> vf610_nfc_{read|write} is good since for PPC we would need out_be32()
> here instead.
> It would be nice if you could abstract the __raw_ once as well. And I am
> not sure if you need those at all since the former functions should work
> here just fine.

As Boris guessed correctly, the reason I used the raw variant was due to
performance improvements due to the barrier. However, I will use
{read|write}l_relaxed instead, which should offer endian abstraction
while not having the performance penalty due to extensive barriers...

--
Stefan

Subject: Re: [PATCH v4 1/6] mtd: nand: vf610_nfc: Freescale NFC for VF610, MPC5125 and others

On 2015-06-03 17:05:16 [+0200], Stefan Agner wrote:
> As Boris guessed correctly, the reason I used the raw variant was due to
> performance improvements due to the barrier. However, I will use

yeah, do you have any numbers by chance?

> {read|write}l_relaxed instead, which should offer endian abstraction
> while not having the performance penalty due to extensive barriers...

well, even those
|$ git grep readl_relaxed arch/powerpc/
|arch/powerpc/include/asm/io.h:#define readl_relaxed(addr) readl(addr)

have the endian swap. So an abstraction like you provided earlier
would be nice :)

> --
> Stefan

Sebastian

2015-06-09 20:33:33

by Stefan Agner

[permalink] [raw]
Subject: Re: [PATCH v4 1/6] mtd: nand: vf610_nfc: Freescale NFC for VF610, MPC5125 and others

On 2015-06-09 22:07, Sebastian Andrzej Siewior wrote:
> On 2015-06-03 17:05:16 [+0200], Stefan Agner wrote:
>> As Boris guessed correctly, the reason I used the raw variant was due to
>> performance improvements due to the barrier. However, I will use
>
> yeah, do you have any numbers by chance?

Not current ones for Linux. I added this in the U-Boot variant of the
driver.

>> {read|write}l_relaxed instead, which should offer endian abstraction
>> while not having the performance penalty due to extensive barriers...
>
> well, even those
> |$ git grep readl_relaxed arch/powerpc/
> |arch/powerpc/include/asm/io.h:#define readl_relaxed(addr) readl(addr)
>
> have the endian swap. So an abstraction like you provided earlier
> would be nice :)

What do you mean by that?

Btw, I sent v5 of the patchset which use the relaxed variants. As
expected, performance did not suffer by that change on ARM.

--
Stefan

Subject: Re: [PATCH v4 1/6] mtd: nand: vf610_nfc: Freescale NFC for VF610, MPC5125 and others

On 2015-06-09 22:31:07 [+0200], Stefan Agner wrote:
> > have the endian swap. So an abstraction like you provided earlier
> > would be nice :)
>
> What do you mean by that?

Something like you did for the reader where you have:

|static inline u32 vf610_nfc_read(struct vf610_nfc *nfc, uint reg)
|{
| return readl(nfc->regs + reg);
|}

instead using readl() directly. So you could also have

|static inline u32 vf610_nfc_read_relaxed(struct vf610_nfc *nfc, uint reg)
|{
| return readl_relaxed(nfc->regs + reg);
|}

Instead of using readl_relaxed(). Unless I'm mistaken, that function was
used more than once. If someone plugs in PPC support he does not need to
add this function anymore but but simply add an ifdef in vf610_nfc_read()
and vf610_nfc_read_relaxed() and replace it whatever works for him.

Sebastian

2015-06-17 13:05:07

by Stefan Agner

[permalink] [raw]
Subject: Re: [PATCH v4 1/6] mtd: nand: vf610_nfc: Freescale NFC for VF610, MPC5125 and others

On 2015-06-09 22:07, Sebastian Andrzej Siewior wrote:
> On 2015-06-03 17:05:16 [+0200], Stefan Agner wrote:
>> As Boris guessed correctly, the reason I used the raw variant was due to
>> performance improvements due to the barrier. However, I will use
>
> yeah, do you have any numbers by chance?
>

Just reevaluated the "performance optimizations". On a VF610 SoC (with
L2 cache) the improvements increase performance by 2.5% and below. On a
VF500 SoC (without L2 cache) it seems to have slightly more impact, up
to 4%. Overall, it seems to influence write more than read.

Back then, when I implemented the improvements it certainly had a bigger
impact. I don't have strong opinion on that...


=> VF610, with relaxed, optimized
[ 45.554246]
[ 45.555934] =================================================
[ 45.561762] mtd_speedtest: MTD device: 3
[ 45.576627] mtd_speedtest: MTD device size 104857600, eraseblock size
131072, page size 2048, count of eraseblocks 800, pages per eraseblock
64, OOB size 64
[ 45.595350] mtd_test: scanning for bad eraseblocks
[ 45.602177] mtd_test: scanned 800 eraseblocks, 0 are bad
[ 46.404682] mtd_speedtest: testing eraseblock write speed
[ 65.650704] mtd_speedtest: eraseblock write speed is 5344 KiB/s
[ 65.656738] mtd_speedtest: testing eraseblock read speed
[ 73.480687] mtd_speedtest: eraseblock read speed is 13118 KiB/s
[ 74.264593] mtd_speedtest: testing page write speed
[ 94.444190] mtd_speedtest: page write speed is 5087 KiB/s
[ 94.449674] mtd_speedtest: testing page read speed
[ 102.760387] mtd_speedtest: page read speed is 12343 KiB/s
[ 103.544838] mtd_speedtest: testing 2 page write speed
[ 123.257753] mtd_speedtest: 2 page write speed is 5199 KiB/s
[ 123.263440] mtd_speedtest: testing 2 page read speed
[ 131.171313] mtd_speedtest: 2 page read speed is 12963 KiB/s
[ 131.176981] mtd_speedtest: Testing erase speed
[ 131.968196] mtd_speedtest: erase speed is 130279 KiB/s
[ 131.973442] mtd_speedtest: Testing 2x multi-block erase speed
[ 132.153410] mtd_speedtest: 2x multi-block erase speed is 585142 KiB/s
[ 132.159924] mtd_speedtest: Testing 4x multi-block erase speed
[ 132.334864] mtd_speedtest: 4x multi-block erase speed is 609523 KiB/s
[ 132.341369] mtd_speedtest: Testing 8x multi-block erase speed
[ 132.516091] mtd_speedtest: 8x multi-block erase speed is 609523 KiB/s
[ 132.522602] mtd_speedtest: Testing 16x multi-block erase speed
[ 132.695800] mtd_speedtest: 16x multi-block erase speed is 613173
KiB/s
[ 132.702389] mtd_speedtest: Testing 32x multi-block erase speed
[ 132.874831] mtd_speedtest: 32x multi-block erase speed is 616867
KiB/s
[ 132.881445] mtd_speedtest: Testing 64x multi-block erase speed
[ 133.053647] mtd_speedtest: 64x multi-block erase speed is 616867
KiB/s
[ 133.060267] mtd_speedtest: finished
[ 133.063812] =================================================

=> VF610, without relaxed and using accessors
vf610_nfc_set/vf610_nfc_set_field and friends:
[ 60.015797]
[ 60.017481] =================================================
[ 60.023320] mtd_speedtest: MTD device: 3
[ 60.037232] mtd_speedtest: MTD device size 104857600, eraseblock size
131072, page size 2048, count of eraseblocks 800, pages per eraseblock
64, OOB size 64
[ 60.066359] mtd_test: scanning for bad eraseblocks
[ 60.074016] mtd_test: scanned 800 eraseblocks, 0 are bad
[ 60.286730] mtd_speedtest: testing eraseblock write speed
[ 79.679892] mtd_speedtest: eraseblock write speed is 5281 KiB/s
[ 79.685930] mtd_speedtest: testing eraseblock read speed
[ 87.563845] mtd_speedtest: eraseblock read speed is 13008 KiB/s
[ 88.353390] mtd_speedtest: testing page write speed
[ 108.984528] mtd_speedtest: page write speed is 4965 KiB/s
[ 108.990041] mtd_speedtest: testing page read speed
[ 117.012486] mtd_speedtest: page read speed is 12774 KiB/s
[ 117.801663] mtd_speedtest: testing 2 page write speed
[ 137.674009] mtd_speedtest: 2 page write speed is 5154 KiB/s
[ 137.679696] mtd_speedtest: testing 2 page read speed
[ 145.643303] mtd_speedtest: 2 page read speed is 12865 KiB/s
[ 145.648982] mtd_speedtest: Testing erase speed
[ 146.444423] mtd_speedtest: erase speed is 129456 KiB/s
[ 146.449669] mtd_speedtest: Testing 2x multi-block erase speed
[ 146.629634] mtd_speedtest: 2x multi-block erase speed is 588505 KiB/s
[ 146.636142] mtd_speedtest: Testing 4x multi-block erase speed
[ 146.813027] mtd_speedtest: 4x multi-block erase speed is 598830 KiB/s
[ 146.819577] mtd_speedtest: Testing 8x multi-block erase speed
[ 146.996654] mtd_speedtest: 8x multi-block erase speed is 595348 KiB/s
[ 147.003192] mtd_speedtest: Testing 16x multi-block erase speed
[ 147.178085] mtd_speedtest: 16x multi-block erase speed is 609523
KiB/s
[ 147.184703] mtd_speedtest: Testing 32x multi-block erase speed
[ 147.358306] mtd_speedtest: 32x multi-block erase speed is 613173
KiB/s
[ 147.364929] mtd_speedtest: Testing 64x multi-block erase speed
[ 147.540505] mtd_speedtest: 64x multi-block erase speed is 605917
KiB/s
[ 147.547106] mtd_speedtest: finished
[ 147.558336] =================================================

=> VF500, with relaxed, optimized
[ 42.878713]
[ 42.880775] =================================================
[ 42.886589] mtd_speedtest: MTD device: 3
[ 42.933491] mtd_speedtest: MTD device size 132120576, eraseblock size
131072, page size 2048, count of eraseblocks 1008, pages per eraseblock
64, OOB size 64
[ 42.981413] mtd_test: scanning for bad eraseblocks
[ 42.987505] mtd_test: block 142 is bad
[ 43.001342] mtd_test: block 1004 is bad
[ 43.024184] mtd_test: block 1005 is bad
[ 43.029184] mtd_test: block 1006 is bad
[ 43.045070] mtd_test: block 1007 is bad
[ 43.048975] mtd_test: scanned 1008 eraseblocks, 5 are bad
[ 43.854654] mtd_speedtest: testing eraseblock write speed
[ 68.873797] mtd_speedtest: eraseblock write speed is 5150 KiB/s
[ 68.879915] mtd_speedtest: testing eraseblock read speed
[ 80.409787] mtd_speedtest: eraseblock read speed is 11171 KiB/s
[ 81.053244] mtd_speedtest: testing page write speed
[ 109.851897] mtd_speedtest: page write speed is 4462 KiB/s
[ 109.857496] mtd_speedtest: testing page read speed
[ 121.964415] mtd_speedtest: page read speed is 10616 KiB/s
[ 122.603068] mtd_speedtest: testing 2 page write speed
[ 148.929905] mtd_speedtest: 2 page write speed is 4880 KiB/s
[ 148.935696] mtd_speedtest: testing 2 page read speed
[ 160.908554] mtd_speedtest: 2 page read speed is 10731 KiB/s
[ 160.914236] mtd_speedtest: Testing erase speed
[ 161.579429] mtd_speedtest: erase speed is 194226 KiB/s
[ 161.584680] mtd_speedtest: Testing 2x multi-block erase speed
[ 162.226599] mtd_speedtest: 2x multi-block erase speed is 201861 KiB/s
[ 162.233153] mtd_speedtest: Testing 4x multi-block erase speed
[ 162.870923] mtd_speedtest: 4x multi-block erase speed is 202818 KiB/s
[ 162.877591] mtd_speedtest: Testing 8x multi-block erase speed
[ 163.503176] mtd_speedtest: 8x multi-block erase speed is 207070 KiB/s
[ 163.509826] mtd_speedtest: Testing 16x multi-block erase speed
[ 164.138392] mtd_speedtest: 16x multi-block erase speed is 206405
KiB/s
[ 164.145128] mtd_speedtest: Testing 32x multi-block erase speed
[ 164.768705] mtd_speedtest: 32x multi-block erase speed is 207741
KiB/s
[ 164.775475] mtd_speedtest: Testing 64x multi-block erase speed
[ 165.394716] mtd_speedtest: 64x multi-block erase speed is 209094
KiB/s
[ 165.401347] mtd_speedtest: finished
[ 165.415581] =================================================

=> VF500, without relaxed and using accessors
[ 93.466642]
[ 93.468578] =================================================
[ 93.474563] mtd_speedtest: MTD device: 3
[ 93.510699] mtd_speedtest: MTD device size 132120576, eraseblock size
131072, page size 2048, count of eraseblocks 1008, pages per eraseblock
64, OOB size 64
[ 93.534490] mtd_test: scanning for bad eraseblocks
[ 93.553066] mtd_test: block 142 is bad
[ 93.586518] mtd_test: block 1004 is bad
[ 93.590559] mtd_test: block 1005 is bad
[ 93.595654] mtd_test: block 1006 is bad
[ 93.607090] mtd_test: block 1007 is bad
[ 93.630667] mtd_test: scanned 1008 eraseblocks, 5 are bad
[ 94.457510] mtd_speedtest: testing eraseblock write speed
[ 120.495695] mtd_speedtest: eraseblock write speed is 4932 KiB/s
[ 120.501726] mtd_speedtest: testing eraseblock read speed
[ 131.937006] mtd_speedtest: eraseblock read speed is 11233 KiB/s
[ 132.576468] mtd_speedtest: testing page write speed
[ 160.663012] mtd_speedtest: page write speed is 4572 KiB/s
[ 160.668608] mtd_speedtest: testing page read speed
[ 172.624020] mtd_speedtest: page read speed is 10743 KiB/s
[ 173.263221] mtd_speedtest: testing 2 page write speed
[ 199.811255] mtd_speedtest: 2 page write speed is 4837 KiB/s
[ 199.816933] mtd_speedtest: testing 2 page read speed
[ 211.558975] mtd_speedtest: 2 page read speed is 10939 KiB/s
[ 211.564657] mtd_speedtest: Testing erase speed
[ 212.227608] mtd_speedtest: erase speed is 194816 KiB/s
[ 212.232951] mtd_speedtest: Testing 2x multi-block erase speed
[ 212.875607] mtd_speedtest: 2x multi-block erase speed is 201544 KiB/s
[ 212.882254] mtd_speedtest: Testing 4x multi-block erase speed
[ 213.521870] mtd_speedtest: 4x multi-block erase speed is 202818 KiB/s
[ 213.528471] mtd_speedtest: Testing 8x multi-block erase speed
[ 214.154043] mtd_speedtest: 8x multi-block erase speed is 207070 KiB/s
[ 214.160690] mtd_speedtest: Testing 16x multi-block erase speed
[ 214.785910] mtd_speedtest: 16x multi-block erase speed is 207405
KiB/s
[ 214.792652] mtd_speedtest: Testing 32x multi-block erase speed
[ 215.418150] mtd_speedtest: 32x multi-block erase speed is 207405
KiB/s
[ 215.424883] mtd_speedtest: Testing 64x multi-block erase speed
[ 216.051367] mtd_speedtest: 64x multi-block erase speed is 206737
KiB/s
[ 216.058056] mtd_speedtest: finished
[ 216.069363] =================================================

--
Stefan

Subject: Re: [PATCH v4 1/6] mtd: nand: vf610_nfc: Freescale NFC for VF610, MPC5125 and others

On Wed, Jun 17, 2015 at 03:02:03PM +0200, Stefan Agner wrote:
> On 2015-06-09 22:07, Sebastian Andrzej Siewior wrote:
> > yeah, do you have any numbers by chance?
>
> Just reevaluated the "performance optimizations". On a VF610 SoC (with
> L2 cache) the improvements increase performance by 2.5% and below. On a
> VF500 SoC (without L2 cache) it seems to have slightly more impact, up
> to 4%. Overall, it seems to influence write more than read.

it could have something to do with the CPU clock and how long it is blocked
due to the sync operation.

> Back then, when I implemented the improvements it certainly had a bigger
> impact. I don't have strong opinion on that...

Thanks for doing that. If you could take those numbers, put them in a table
like with/without relaxed and so one, write how you got them (modprobe bla)
and make it part of the commit message then everybody could see about how much
we talk here. Thanks again.

> --
> Stefan

Sebastian