LinuxLists.cc - 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

2019-03-21 20:19:29

Subject: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

Hello,

I am experiencing the following crash:
------------[ cut here ]------------
kernel BUG at mm/slub.c:3950!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted
5.1.0-rc1-00080-g37b8cb064293-dirty #4252
Hardware name: Amlogic Meson platform
PC is at kfree+0x250/0x274
LR is at meson_nfc_exec_op+0x3b0/0x408
...
my goal is to add support for the 32-bit Amlogic Meson SoCs (ARM
Cortex-A5 / Cortex-A9 cores) in the meson-nand driver.

I have traced this crash to the kfree() in meson_nfc_read_buf().
my observation is as follows:
- meson_nfc_read_buf() is called 7 times without any crash, the
kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600
(physical address)
- the eight time meson_nfc_read_buf() is called kzalloc() call returns
0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the
final kfree() crashes
- changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to
PAGE_SIZE works around that crash
- disabling the meson-nand driver makes my board boot just fine
- Liang has tested the unmodified code on a 64-bit Amlogic SoC (ARM
Cortex-A53 cores) and he doesn't see the crash there

in case the selected SLAB allocator is relevant:
CONFIG_SLUB=y

the following printk statement is used to print the addresses returned
by the kzalloc() call in meson_nfc_read_buf():
printk("%s 0x%px 0x%08x\n", __func__, info, virt_to_phys(info));

my questions are:
- why does kzalloc() return an unaligned address 0xee39a38b (virtual
address) / 0x2e39a38b (physical address)?
- how can further analyze this issue?
- (I don't know where to start analyzing: in mm/, arch/arm/mm, the
meson-nand driver seems to work fine on the 64-bit SoCs but that
doesn't fully rule it out, ...)

Regards
Martin

2019-03-21 21:45:20

by Matthew Wilcox (Oracle)

[permalink] [raw]

Subject: Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote:
> Hello,
>
> I am experiencing the following crash:
> ------------[ cut here ]------------
> kernel BUG at mm/slub.c:3950!

if (unlikely(!PageSlab(page))) {
BUG_ON(!PageCompound(page));

You called kfree() on the address of a page which wasn't allocated by slab.

> I have traced this crash to the kfree() in meson_nfc_read_buf().
> my observation is as follows:
> - meson_nfc_read_buf() is called 7 times without any crash, the
> kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600
> (physical address)
> - the eight time meson_nfc_read_buf() is called kzalloc() call returns
> 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the
> final kfree() crashes
> - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to
> PAGE_SIZE works around that crash

I suspect you're doing something which corrupts memory. Overrunning
the end of your allocation or something similar. Have you tried KASAN
or even the various slab debugging (eg redzones)?

2019-03-22 21:09:05

by Martin Blumenstingl

[permalink] [raw]

Subject: Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

Hi Matthew,

On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox <[email protected]> wrote:
>
> On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote:
> > Hello,
> >
> > I am experiencing the following crash:
> > ------------[ cut here ]------------
> > kernel BUG at mm/slub.c:3950!
>
> if (unlikely(!PageSlab(page))) {
> BUG_ON(!PageCompound(page));
>
> You called kfree() on the address of a page which wasn't allocated by slab.
>
> > I have traced this crash to the kfree() in meson_nfc_read_buf().
> > my observation is as follows:
> > - meson_nfc_read_buf() is called 7 times without any crash, the
> > kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600
> > (physical address)
> > - the eight time meson_nfc_read_buf() is called kzalloc() call returns
> > 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the
> > final kfree() crashes
> > - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to
> > PAGE_SIZE works around that crash
>
> I suspect you're doing something which corrupts memory. Overrunning
> the end of your allocation or something similar. Have you tried KASAN
> or even the various slab debugging (eg redzones)?
KASAN is not available on 32-bit ARM. there was some progress last
year [0] but it didn't make it into mainline. I tried to make the
patches apply again and got it to compile (and my kernel is still
booting) but I have no idea if it's still working. for anyone
interested, my patches are here: [1] (I consider this a HACK because I
don't know anything about the code which is being touched in the
patches, I only made it compile)

SLAB debugging (redzones) were a great hint, thank you very much for
that Matthew! I enabled:
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB_DEBUG_ON=y
and with that I now get "BUG kmalloc-64 (Not tainted): Redzone
overwritten" (a larger kernel log extract is attached).

I'm starting to wonder if the NAND controller (hardware) writes more
than 8 bytes.
some context: the "info" buffer allocated in meson_nfc_read_buf is
then passed to the NAND controller IP (after using dma_map_single).

Liang, how does the NAND controller know that it only has to send
PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all
other callers of meson_nfc_dma_buffer_setup (which passes the info
buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE)
bytes?

Regards
Martin

[0] https://lore.kernel.org/patchwork/cover/913212/
[1] https://github.com/xdarklight/linux/tree/arm-kasan-hack-v5.1-rc1

Attachments:

slub-redzones.txt (22.95 kB)

2019-03-25 10:05:29

by Liang Yang

[permalink] [raw]

Subject: Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

Hi Martin,

On 2019/3/23 5:07, Martin Blumenstingl wrote:
> Hi Matthew,
>
> On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox <[email protected]> wrote:
>>
>> On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote:
>>> Hello,
>>>
>>> I am experiencing the following crash:
>>> ------------[ cut here ]------------
>>> kernel BUG at mm/slub.c:3950!
>>
>> if (unlikely(!PageSlab(page))) {
>> BUG_ON(!PageCompound(page));
>>
>> You called kfree() on the address of a page which wasn't allocated by slab.
>>
>>> I have traced this crash to the kfree() in meson_nfc_read_buf().
>>> my observation is as follows:
>>> - meson_nfc_read_buf() is called 7 times without any crash, the
>>> kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600
>>> (physical address)
>>> - the eight time meson_nfc_read_buf() is called kzalloc() call returns
>>> 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the
>>> final kfree() crashes
>>> - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to
>>> PAGE_SIZE works around that crash
>>
>> I suspect you're doing something which corrupts memory. Overrunning
>> the end of your allocation or something similar. Have you tried KASAN
>> or even the various slab debugging (eg redzones)?
> KASAN is not available on 32-bit ARM. there was some progress last
> year [0] but it didn't make it into mainline. I tried to make the
> patches apply again and got it to compile (and my kernel is still
> booting) but I have no idea if it's still working. for anyone
> interested, my patches are here: [1] (I consider this a HACK because I
> don't know anything about the code which is being touched in the
> patches, I only made it compile)
>
> SLAB debugging (redzones) were a great hint, thank you very much for
> that Matthew! I enabled:
> CONFIG_SLUB_DEBUG=y
> CONFIG_SLUB_DEBUG_ON=y
> and with that I now get "BUG kmalloc-64 (Not tainted): Redzone
> overwritten" (a larger kernel log extract is attached).
>
> I'm starting to wonder if the NAND controller (hardware) writes more
> than 8 bytes.
> some context: the "info" buffer allocated in meson_nfc_read_buf is
> then passed to the NAND controller IP (after using dma_map_single).
>
> Liang, how does the NAND controller know that it only has to send
> PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all
> other callers of meson_nfc_dma_buffer_setup (which passes the info
> buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE)
> bytes?
>
NFC_CMD_N2M and CMDRWGEN are different commands. CMDRWGEN needs to set
the ecc page size (1KB or 512B) and Pages(2, 4, 8, ...), so
PER_INFO_BYTE(= 8) bytes for each ecc page.
I have never used NFC_CMD_N2M to transfer data before, because it is
very low efficient. And I do a experiment with the attachment and find
on overwritten on my meson axg platform.

Martin, I would appreciate it very much if you would try the attachment
on your meson m8b platform.

>
> Regards
> Martin
>
>
> [0] https://lore.kernel.org/patchwork/cover/913212/
> [1] https://github.com/xdarklight/linux/tree/arm-kasan-hack-v5.1-rc1
>

Attachments:

nand_debug.diff (1.08 kB)

2019-03-25 18:33:02

by Martin Blumenstingl

[permalink] [raw]

Subject: Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

Hi Liang,

On Mon, Mar 25, 2019 at 11:03 AM Liang Yang <[email protected]> wrote:
>
> Hi Martin,
>
> On 2019/3/23 5:07, Martin Blumenstingl wrote:
> > Hi Matthew,
> >
> > On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox <[email protected]> wrote:
> >>
> >> On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote:
> >>> Hello,
> >>>
> >>> I am experiencing the following crash:
> >>> ------------[ cut here ]------------
> >>> kernel BUG at mm/slub.c:3950!
> >>
> >> if (unlikely(!PageSlab(page))) {
> >> BUG_ON(!PageCompound(page));
> >>
> >> You called kfree() on the address of a page which wasn't allocated by slab.
> >>
> >>> I have traced this crash to the kfree() in meson_nfc_read_buf().
> >>> my observation is as follows:
> >>> - meson_nfc_read_buf() is called 7 times without any crash, the
> >>> kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600
> >>> (physical address)
> >>> - the eight time meson_nfc_read_buf() is called kzalloc() call returns
> >>> 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the
> >>> final kfree() crashes
> >>> - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to
> >>> PAGE_SIZE works around that crash
> >>
> >> I suspect you're doing something which corrupts memory. Overrunning
> >> the end of your allocation or something similar. Have you tried KASAN
> >> or even the various slab debugging (eg redzones)?
> > KASAN is not available on 32-bit ARM. there was some progress last
> > year [0] but it didn't make it into mainline. I tried to make the
> > patches apply again and got it to compile (and my kernel is still
> > booting) but I have no idea if it's still working. for anyone
> > interested, my patches are here: [1] (I consider this a HACK because I
> > don't know anything about the code which is being touched in the
> > patches, I only made it compile)
> >
> > SLAB debugging (redzones) were a great hint, thank you very much for
> > that Matthew! I enabled:
> > CONFIG_SLUB_DEBUG=y
> > CONFIG_SLUB_DEBUG_ON=y
> > and with that I now get "BUG kmalloc-64 (Not tainted): Redzone
> > overwritten" (a larger kernel log extract is attached).
> >
> > I'm starting to wonder if the NAND controller (hardware) writes more
> > than 8 bytes.
> > some context: the "info" buffer allocated in meson_nfc_read_buf is
> > then passed to the NAND controller IP (after using dma_map_single).
> >
> > Liang, how does the NAND controller know that it only has to send
> > PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all
> > other callers of meson_nfc_dma_buffer_setup (which passes the info
> > buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE)
> > bytes?
> >
> NFC_CMD_N2M and CMDRWGEN are different commands. CMDRWGEN needs to set
> the ecc page size (1KB or 512B) and Pages(2, 4, 8, ...), so
> PER_INFO_BYTE(= 8) bytes for each ecc page.
> I have never used NFC_CMD_N2M to transfer data before, because it is
> very low efficient. And I do a experiment with the attachment and find
> on overwritten on my meson axg platform.
>
> Martin, I would appreciate it very much if you would try the attachment
> on your meson m8b platform.
thank you for your debug patch! on my board 2 * PER_INFO_BYTE is not enough.
I took the idea from your patch and adapted it so I could print a
buffer with 256 bytes (which seems to be "big enough" for my board).
see the attached, modified patch

in the output I see that sometimes the first 32 bytes are not touched
by the controller, but everything beyond 32 bytes is modified in the
info buffer.

I also tried to increase the buffer size to 512, but that didn't make
a difference (I never saw any info buffer modification beyond 256
bytes).

also I just noticed that I didn't give you much details on my NAND chip yet.
from Amlogic vendor u-boot on Meson8m2 (all my Meson8b boards have
eMMC flash, but I believe the NAND controller on Meson8 to GXBB is
identical):
m8m2_n200_v1#amlnf chipinfo
flash info
name:B revision 20nm NAND 8GiB H27UCG8T2B, id:ad de 94 eb 74 44 0 0
pagesize:0x4000, blocksize:0x400000, oobsize:0x500, chipsize:0x2000,
option:0x8, T_REA:16, T_RHOH:15
hw controller info
chip_num:1, onfi_mode:0, page_shift:14, block_shift:22, option:0xc2
ecc_unit:1024, ecc_bytes:70, ecc_steps:16, ecc_max:40
bch_mode:5, user_mode:2, oobavail:32, oobtail:64384

Regards

Martin

Attachments:

debug-256-buffer-output.txt (7.89 kB)
nand_debug_martin.patch (986.00 B)
Download all attachments

2019-03-27 08:57:44

by Liang Yang

[permalink] [raw]

Subject: Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

Hi Martin,

Thanks a lot.
On 2019/3/26 2:31, Martin Blumenstingl wrote:
> Hi Liang,
>
> On Mon, Mar 25, 2019 at 11:03 AM Liang Yang <[email protected]> wrote:
>>
>> Hi Martin,
>>
>> On 2019/3/23 5:07, Martin Blumenstingl wrote:
>>> Hi Matthew,
>>>
>>> On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox <[email protected]> wrote:
>>>>
>>>> On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote:
>>>>> Hello,
>>>>>
>>>>> I am experiencing the following crash:
>>>>> ------------[ cut here ]------------
>>>>> kernel BUG at mm/slub.c:3950!
>>>>
>>>> if (unlikely(!PageSlab(page))) {
>>>> BUG_ON(!PageCompound(page));
>>>>
>>>> You called kfree() on the address of a page which wasn't allocated by slab.
>>>>
>>>>> I have traced this crash to the kfree() in meson_nfc_read_buf().
>>>>> my observation is as follows:
>>>>> - meson_nfc_read_buf() is called 7 times without any crash, the
>>>>> kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600
>>>>> (physical address)
>>>>> - the eight time meson_nfc_read_buf() is called kzalloc() call returns
>>>>> 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the
>>>>> final kfree() crashes
>>>>> - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to
>>>>> PAGE_SIZE works around that crash
>>>>
>>>> I suspect you're doing something which corrupts memory. Overrunning
>>>> the end of your allocation or something similar. Have you tried KASAN
>>>> or even the various slab debugging (eg redzones)?
>>> KASAN is not available on 32-bit ARM. there was some progress last
>>> year [0] but it didn't make it into mainline. I tried to make the
>>> patches apply again and got it to compile (and my kernel is still
>>> booting) but I have no idea if it's still working. for anyone
>>> interested, my patches are here: [1] (I consider this a HACK because I
>>> don't know anything about the code which is being touched in the
>>> patches, I only made it compile)
>>>
>>> SLAB debugging (redzones) were a great hint, thank you very much for
>>> that Matthew! I enabled:
>>> CONFIG_SLUB_DEBUG=y
>>> CONFIG_SLUB_DEBUG_ON=y
>>> and with that I now get "BUG kmalloc-64 (Not tainted): Redzone
>>> overwritten" (a larger kernel log extract is attached).
>>>
>>> I'm starting to wonder if the NAND controller (hardware) writes more
>>> than 8 bytes.
>>> some context: the "info" buffer allocated in meson_nfc_read_buf is
>>> then passed to the NAND controller IP (after using dma_map_single).
>>>
>>> Liang, how does the NAND controller know that it only has to send
>>> PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all
>>> other callers of meson_nfc_dma_buffer_setup (which passes the info
>>> buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE)
>>> bytes?
>>>
>> NFC_CMD_N2M and CMDRWGEN are different commands. CMDRWGEN needs to set
>> the ecc page size (1KB or 512B) and Pages(2, 4, 8, ...), so
>> PER_INFO_BYTE(= 8) bytes for each ecc page.
>> I have never used NFC_CMD_N2M to transfer data before, because it is
>> very low efficient. And I do a experiment with the attachment and find
>> on overwritten on my meson axg platform.
>>
>> Martin, I would appreciate it very much if you would try the attachment
>> on your meson m8b platform.
> thank you for your debug patch! on my board 2 * PER_INFO_BYTE is not enough.
> I took the idea from your patch and adapted it so I could print a
> buffer with 256 bytes (which seems to be "big enough" for my board).
it only needs PER_INFO_BYTE (= 8) bytes, because NFC_CMD_N2M don't set
*Pages*, that is not like CMDRWGEN which needs Pages*PER_INFO_BYTE (= 8)
bytes when setting *Pages* parameter. I have been thinking that
NFC_CMD_N2M only occupis PER_INFO_BYTE (= 8) bytes. And i have tried to
not set the info address, the machine would crash.
> see the attached, modified patch
>
> in the output I see that sometimes the first 32 bytes are not touched
> by the controller, but everything beyond 32 bytes is modified in the
> info buffer.
>
it really makes sense that the controller sometimes fills the space
beyond the first 8 bytes. However i expect the controller should only
take the first 8 bytes when using NFC_CMD_N2M.
> I also tried to increase the buffer size to 512, but that didn't make
> a difference (I never saw any info buffer modification beyond 256
> bytes).
>
> also I just noticed that I didn't give you much details on my NAND chip yet.
> from Amlogic vendor u-boot on Meson8m2 (all my Meson8b boards have
> eMMC flash, but I believe the NAND controller on Meson8 to GXBB is
> identical):
> m8m2_n200_v1#amlnf chipinfo
> flash info
> name:B revision 20nm NAND 8GiB H27UCG8T2B, id:ad de 94 eb 74 44 0 0
> pagesize:0x4000, blocksize:0x400000, oobsize:0x500, chipsize:0x2000,
> option:0x8, T_REA:16, T_RHOH:15
> hw controller info
> chip_num:1, onfi_mode:0, page_shift:14, block_shift:22, option:0xc2
> ecc_unit:1024, ecc_bytes:70, ecc_steps:16, ecc_max:40
> bch_mode:5, user_mode:2, oobavail:32, oobtail:64384
>
I don't think it is caused by a different NAND type, but i have followed
the some test on my GXL platform. we can see the result from the
attachment. By the way, i don't find any information about this on meson
NFC datasheet, so i will ask our VLSI.
Martin, May you reproduce it with the new patch on meson8b platform ? I
need a more clear and easier compared log like gxl.txt. Thanks.

>
> Regards
>
> Martin
>

Attachments:

nand_debug.diff (3.32 kB)
gxl.txt (25.80 kB)
Download all attachments

2019-03-28 18:04:34

by Martin Blumenstingl

[permalink] [raw]

Subject: Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

Hi Liang,

On Wed, Mar 27, 2019 at 9:52 AM Liang Yang <[email protected]> wrote:
>
> Hi Martin,
>
> Thanks a lot.
> On 2019/3/26 2:31, Martin Blumenstingl wrote:
> > Hi Liang,
> >
> > On Mon, Mar 25, 2019 at 11:03 AM Liang Yang <[email protected]> wrote:
> >>
> >> Hi Martin,
> >>
> >> On 2019/3/23 5:07, Martin Blumenstingl wrote:
> >>> Hi Matthew,
> >>>
> >>> On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox <[email protected]> wrote:
> >>>>
> >>>> On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote:
> >>>>> Hello,
> >>>>>
> >>>>> I am experiencing the following crash:
> >>>>> ------------[ cut here ]------------
> >>>>> kernel BUG at mm/slub.c:3950!
> >>>>
> >>>> if (unlikely(!PageSlab(page))) {
> >>>> BUG_ON(!PageCompound(page));
> >>>>
> >>>> You called kfree() on the address of a page which wasn't allocated by slab.
> >>>>
> >>>>> I have traced this crash to the kfree() in meson_nfc_read_buf().
> >>>>> my observation is as follows:
> >>>>> - meson_nfc_read_buf() is called 7 times without any crash, the
> >>>>> kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600
> >>>>> (physical address)
> >>>>> - the eight time meson_nfc_read_buf() is called kzalloc() call returns
> >>>>> 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the
> >>>>> final kfree() crashes
> >>>>> - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to
> >>>>> PAGE_SIZE works around that crash
> >>>>
> >>>> I suspect you're doing something which corrupts memory. Overrunning
> >>>> the end of your allocation or something similar. Have you tried KASAN
> >>>> or even the various slab debugging (eg redzones)?
> >>> KASAN is not available on 32-bit ARM. there was some progress last
> >>> year [0] but it didn't make it into mainline. I tried to make the
> >>> patches apply again and got it to compile (and my kernel is still
> >>> booting) but I have no idea if it's still working. for anyone
> >>> interested, my patches are here: [1] (I consider this a HACK because I
> >>> don't know anything about the code which is being touched in the
> >>> patches, I only made it compile)
> >>>
> >>> SLAB debugging (redzones) were a great hint, thank you very much for
> >>> that Matthew! I enabled:
> >>> CONFIG_SLUB_DEBUG=y
> >>> CONFIG_SLUB_DEBUG_ON=y
> >>> and with that I now get "BUG kmalloc-64 (Not tainted): Redzone
> >>> overwritten" (a larger kernel log extract is attached).
> >>>
> >>> I'm starting to wonder if the NAND controller (hardware) writes more
> >>> than 8 bytes.
> >>> some context: the "info" buffer allocated in meson_nfc_read_buf is
> >>> then passed to the NAND controller IP (after using dma_map_single).
> >>>
> >>> Liang, how does the NAND controller know that it only has to send
> >>> PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all
> >>> other callers of meson_nfc_dma_buffer_setup (which passes the info
> >>> buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE)
> >>> bytes?
> >>>
> >> NFC_CMD_N2M and CMDRWGEN are different commands. CMDRWGEN needs to set
> >> the ecc page size (1KB or 512B) and Pages(2, 4, 8, ...), so
> >> PER_INFO_BYTE(= 8) bytes for each ecc page.
> >> I have never used NFC_CMD_N2M to transfer data before, because it is
> >> very low efficient. And I do a experiment with the attachment and find
> >> on overwritten on my meson axg platform.
> >>
> >> Martin, I would appreciate it very much if you would try the attachment
> >> on your meson m8b platform.
> > thank you for your debug patch! on my board 2 * PER_INFO_BYTE is not enough.
> > I took the idea from your patch and adapted it so I could print a
> > buffer with 256 bytes (which seems to be "big enough" for my board).
> it only needs PER_INFO_BYTE (= 8) bytes, because NFC_CMD_N2M don't set
> *Pages*, that is not like CMDRWGEN which needs Pages*PER_INFO_BYTE (= 8)
> bytes when setting *Pages* parameter. I have been thinking that
> NFC_CMD_N2M only occupis PER_INFO_BYTE (= 8) bytes. And i have tried to
> not set the info address, the machine would crash.
thank you for the explanation. the command is built using:
cmd = NFC_CMD_N2M | (len & GENMASK(5, 0));

> > see the attached, modified patch
> >
> > in the output I see that sometimes the first 32 bytes are not touched
> > by the controller, but everything beyond 32 bytes is modified in the
> > info buffer.
> >
> it really makes sense that the controller sometimes fills the space
> beyond the first 8 bytes. However i expect the controller should only
> take the first 8 bytes when using NFC_CMD_N2M.
in my tests (see the attached log output) it seems that the info
buffer size has the following constraints:
- use the "len" which is passed to meson_nfc_read_buf
- if "len" is smaller than PER_INFO_BYTE then use PER_INFO_BYTE (= 8)

> > I also tried to increase the buffer size to 512, but that didn't make
> > a difference (I never saw any info buffer modification beyond 256
> > bytes).
> >
> > also I just noticed that I didn't give you much details on my NAND chip yet.
> > from Amlogic vendor u-boot on Meson8m2 (all my Meson8b boards have
> > eMMC flash, but I believe the NAND controller on Meson8 to GXBB is
> > identical):
> > m8m2_n200_v1#amlnf chipinfo
> > flash info
> > name:B revision 20nm NAND 8GiB H27UCG8T2B, id:ad de 94 eb 74 44 0 0
> > pagesize:0x4000, blocksize:0x400000, oobsize:0x500, chipsize:0x2000,
> > option:0x8, T_REA:16, T_RHOH:15
> > hw controller info
> > chip_num:1, onfi_mode:0, page_shift:14, block_shift:22, option:0xc2
> > ecc_unit:1024, ecc_bytes:70, ecc_steps:16, ecc_max:40
> > bch_mode:5, user_mode:2, oobavail:32, oobtail:64384
> >
> I don't think it is caused by a different NAND type, but i have followed
> the some test on my GXL platform. we can see the result from the
> attachment. By the way, i don't find any information about this on meson
> NFC datasheet, so i will ask our VLSI.
> Martin, May you reproduce it with the new patch on meson8b platform ? I
> need a more clear and easier compared log like gxl.txt. Thanks.
your gxl.txt is great, finally I can also compare my own results with
something that works for you!
in my results (see attachment) the "DATA_IN [256 B, force 8-bit]"
instructions result in a different info buffer output.
does this make any sense to you?

Regards
Martin

Attachments:

nand-debug-output-operations-and-info-buffer.txt (15.43 kB)

2019-03-29 07:44:51

by Liang Yang

[permalink] [raw]

Subject: Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

Hi Martin,

On 2019/3/29 2:03, Martin Blumenstingl wrote:
> Hi Liang,
[......]
>> I don't think it is caused by a different NAND type, but i have followed
>> the some test on my GXL platform. we can see the result from the
>> attachment. By the way, i don't find any information about this on meson
>> NFC datasheet, so i will ask our VLSI.
>> Martin, May you reproduce it with the new patch on meson8b platform ? I
>> need a more clear and easier compared log like gxl.txt. Thanks.
> your gxl.txt is great, finally I can also compare my own results with
> something that works for you!
> in my results (see attachment) the "DATA_IN [256 B, force 8-bit]"
> instructions result in a different info buffer output.
> does this make any sense to you?
>
I have asked our VLSI designer for explanation or simulation result by
an e-mail. Thanks.

2019-04-05 04:32:44

by Martin Blumenstingl

[permalink] [raw]

Subject: Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

Hi Liang,

On Fri, Mar 29, 2019 at 8:44 AM Liang Yang <[email protected]> wrote:
>
> Hi Martin,
>
> On 2019/3/29 2:03, Martin Blumenstingl wrote:
> > Hi Liang,
> [......]
> >> I don't think it is caused by a different NAND type, but i have followed
> >> the some test on my GXL platform. we can see the result from the
> >> attachment. By the way, i don't find any information about this on meson
> >> NFC datasheet, so i will ask our VLSI.
> >> Martin, May you reproduce it with the new patch on meson8b platform ? I
> >> need a more clear and easier compared log like gxl.txt. Thanks.
> > your gxl.txt is great, finally I can also compare my own results with
> > something that works for you!
> > in my results (see attachment) the "DATA_IN [256 B, force 8-bit]"
> > instructions result in a different info buffer output.
> > does this make any sense to you?
> >
> I have asked our VLSI designer for explanation or simulation result by
> an e-mail. Thanks.
do you have any update on this?

Martin

2019-04-10 15:27:40

by Liang Yang

[permalink] [raw]

Subject: Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

Hi Martin,

On 2019/4/5 12:30, Martin Blumenstingl wrote:
> Hi Liang,
>
> On Fri, Mar 29, 2019 at 8:44 AM Liang Yang <[email protected]> wrote:
>>
>> Hi Martin,
>>
>> On 2019/3/29 2:03, Martin Blumenstingl wrote:
>>> Hi Liang,
>> [......]
>>>> I don't think it is caused by a different NAND type, but i have followed
>>>> the some test on my GXL platform. we can see the result from the
>>>> attachment. By the way, i don't find any information about this on meson
>>>> NFC datasheet, so i will ask our VLSI.
>>>> Martin, May you reproduce it with the new patch on meson8b platform ? I
>>>> need a more clear and easier compared log like gxl.txt. Thanks.
>>> your gxl.txt is great, finally I can also compare my own results with
>>> something that works for you!
>>> in my results (see attachment) the "DATA_IN [256 B, force 8-bit]"
>>> instructions result in a different info buffer output.
>>> does this make any sense to you?
>>>
>> I have asked our VLSI designer for explanation or simulation result by
>> an e-mail. Thanks.
> do you have any update on this?
> Sorry. I haven't got reply from VLSI designer yet. We tried to improve
priority yesterday, but i still can't estimate the time. There is no
document or change list showing the difference between m8/b and gxl/axg
serial chips. Now it seems that we can't use command NFC_CMD_N2M on nand
initialization for m8/b chips and use *read byte from NFC fifo register*
instead.
>
> Martin
>
> .
>

2019-04-10 19:08:02

by Martin Blumenstingl

[permalink] [raw]

Subject: Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

Hi Liang,

On Wed, Apr 10, 2019 at 1:08 PM Liang Yang <[email protected]> wrote:
>
> Hi Martin,
>
> On 2019/4/5 12:30, Martin Blumenstingl wrote:
> > Hi Liang,
> >
> > On Fri, Mar 29, 2019 at 8:44 AM Liang Yang <[email protected]> wrote:
> >>
> >> Hi Martin,
> >>
> >> On 2019/3/29 2:03, Martin Blumenstingl wrote:
> >>> Hi Liang,
> >> [......]
> >>>> I don't think it is caused by a different NAND type, but i have followed
> >>>> the some test on my GXL platform. we can see the result from the
> >>>> attachment. By the way, i don't find any information about this on meson
> >>>> NFC datasheet, so i will ask our VLSI.
> >>>> Martin, May you reproduce it with the new patch on meson8b platform ? I
> >>>> need a more clear and easier compared log like gxl.txt. Thanks.
> >>> your gxl.txt is great, finally I can also compare my own results with
> >>> something that works for you!
> >>> in my results (see attachment) the "DATA_IN [256 B, force 8-bit]"
> >>> instructions result in a different info buffer output.
> >>> does this make any sense to you?
> >>>
> >> I have asked our VLSI designer for explanation or simulation result by
> >> an e-mail. Thanks.
> > do you have any update on this?
> Sorry. I haven't got reply from VLSI designer yet. We tried to improve
> priority yesterday, but i still can't estimate the time. There is no
> document or change list showing the difference between m8/b and gxl/axg
> serial chips. Now it seems that we can't use command NFC_CMD_N2M on nand
> initialization for m8/b chips and use *read byte from NFC fifo register*
> instead.
thank you for the status update!

I am trying to understand your suggestion not to use NFC_CMD_N2M:
the documentation (public S922X datasheet from Hardkernel: [0]) states
that P_NAND_BUF (NFC_REG_BUF in the meson_nand driver) can hold up to
four bytes of data. is this the "read byte from NFC FIFO register" you
mentioned?

Before I spend time changing the code to use the FIFO register I would
like to wait for an answer from your VLSI designer.
Setting the "correct" info buffer length for NFC_CMD_N2M on the 32-bit
SoCs seems like an easier solution compared to switching to the FIFO
register. Keeping NFC_CMD_N2M on the 32-bit SoCs also allows us to
have only one code-path for 32 and 64 bit SoCs, meaning we don't have
to maintain two separate code-paths for basically the same
functionality (assuming that NFC_CMD_N2M is not completely broken on
the 32-bit SoCs, we just don't know how to use it yet).

Regards
Martin

[0] https://dn.odroid.com/S922X/ODROID-N2/Datasheet/S922X_Public_Datasheet_V0.2.pdf

2019-04-11 03:01:34

by Liang Yang

[permalink] [raw]

Subject: Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

Hi Martin,
On 2019/4/11 1:54, Martin Blumenstingl wrote:
> Hi Liang,
>
> On Wed, Apr 10, 2019 at 1:08 PM Liang Yang <[email protected]> wrote:
>>
>> Hi Martin,
>>
>> On 2019/4/5 12:30, Martin Blumenstingl wrote:
>>> Hi Liang,
>>>
>>> On Fri, Mar 29, 2019 at 8:44 AM Liang Yang <[email protected]> wrote:
>>>>
>>>> Hi Martin,
>>>>
>>>> On 2019/3/29 2:03, Martin Blumenstingl wrote:
>>>>> Hi Liang,
>>>> [......]
>>>>>> I don't think it is caused by a different NAND type, but i have followed
>>>>>> the some test on my GXL platform. we can see the result from the
>>>>>> attachment. By the way, i don't find any information about this on meson
>>>>>> NFC datasheet, so i will ask our VLSI.
>>>>>> Martin, May you reproduce it with the new patch on meson8b platform ? I
>>>>>> need a more clear and easier compared log like gxl.txt. Thanks.
>>>>> your gxl.txt is great, finally I can also compare my own results with
>>>>> something that works for you!
>>>>> in my results (see attachment) the "DATA_IN [256 B, force 8-bit]"
>>>>> instructions result in a different info buffer output.
>>>>> does this make any sense to you?
>>>>>
>>>> I have asked our VLSI designer for explanation or simulation result by
>>>> an e-mail. Thanks.
>>> do you have any update on this?
>> Sorry. I haven't got reply from VLSI designer yet. We tried to improve
>> priority yesterday, but i still can't estimate the time. There is no
>> document or change list showing the difference between m8/b and gxl/axg
>> serial chips. Now it seems that we can't use command NFC_CMD_N2M on nand
>> initialization for m8/b chips and use *read byte from NFC fifo register*
>> instead.
> thank you for the status update!
>
> I am trying to understand your suggestion not to use NFC_CMD_N2M:
> the documentation (public S922X datasheet from Hardkernel: [0]) states
> that P_NAND_BUF (NFC_REG_BUF in the meson_nand driver) can hold up to
> four bytes of data. is this the "read byte from NFC FIFO register" you
> mentioned?
>
You are right.take the early meson NFC driver V2 on previous mail as a
reference.

> Before I spend time changing the code to use the FIFO register I would
> like to wait for an answer from your VLSI designer.
> Setting the "correct" info buffer length for NFC_CMD_N2M on the 32-bit
> SoCs seems like an easier solution compared to switching to the FIFO
> register. Keeping NFC_CMD_N2M on the 32-bit SoCs also allows us to
> have only one code-path for 32 and 64 bit SoCs, meaning we don't have
> to maintain two separate code-paths for basically the same
> functionality (assuming that NFC_CMD_N2M is not completely broken on
> the 32-bit SoCs, we just don't know how to use it yet).
>
All right. I am also waiting for the answer.
>
> Regards
> Martin
>
>
> [0] https://dn.odroid.com/S922X/ODROID-N2/Datasheet/S922X_Public_Datasheet_V0.2.pdf
>
> .
>

2019-06-08 20:02:05

by Martin Blumenstingl

[permalink] [raw]

Subject: Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()

Hi Liang,

On Thu, Apr 11, 2019 at 5:00 AM Liang Yang <[email protected]> wrote:
>
> Hi Martin,
> On 2019/4/11 1:54, Martin Blumenstingl wrote:
> > Hi Liang,
> >
> > On Wed, Apr 10, 2019 at 1:08 PM Liang Yang <[email protected]> wrote:
> >>
> >> Hi Martin,
> >>
> >> On 2019/4/5 12:30, Martin Blumenstingl wrote:
> >>> Hi Liang,
> >>>
> >>> On Fri, Mar 29, 2019 at 8:44 AM Liang Yang <[email protected]> wrote:
> >>>>
> >>>> Hi Martin,
> >>>>
> >>>> On 2019/3/29 2:03, Martin Blumenstingl wrote:
> >>>>> Hi Liang,
> >>>> [......]
> >>>>>> I don't think it is caused by a different NAND type, but i have followed
> >>>>>> the some test on my GXL platform. we can see the result from the
> >>>>>> attachment. By the way, i don't find any information about this on meson
> >>>>>> NFC datasheet, so i will ask our VLSI.
> >>>>>> Martin, May you reproduce it with the new patch on meson8b platform ? I
> >>>>>> need a more clear and easier compared log like gxl.txt. Thanks.
> >>>>> your gxl.txt is great, finally I can also compare my own results with
> >>>>> something that works for you!
> >>>>> in my results (see attachment) the "DATA_IN [256 B, force 8-bit]"
> >>>>> instructions result in a different info buffer output.
> >>>>> does this make any sense to you?
> >>>>>
> >>>> I have asked our VLSI designer for explanation or simulation result by
> >>>> an e-mail. Thanks.
> >>> do you have any update on this?
> >> Sorry. I haven't got reply from VLSI designer yet. We tried to improve
> >> priority yesterday, but i still can't estimate the time. There is no
> >> document or change list showing the difference between m8/b and gxl/axg
> >> serial chips. Now it seems that we can't use command NFC_CMD_N2M on nand
> >> initialization for m8/b chips and use *read byte from NFC fifo register*
> >> instead.
> > thank you for the status update!
> >
> > I am trying to understand your suggestion not to use NFC_CMD_N2M:
> > the documentation (public S922X datasheet from Hardkernel: [0]) states
> > that P_NAND_BUF (NFC_REG_BUF in the meson_nand driver) can hold up to
> > four bytes of data. is this the "read byte from NFC FIFO register" you
> > mentioned?
> >
> You are right.take the early meson NFC driver V2 on previous mail as a
> reference.
>
> > Before I spend time changing the code to use the FIFO register I would
> > like to wait for an answer from your VLSI designer.
> > Setting the "correct" info buffer length for NFC_CMD_N2M on the 32-bit
> > SoCs seems like an easier solution compared to switching to the FIFO
> > register. Keeping NFC_CMD_N2M on the 32-bit SoCs also allows us to
> > have only one code-path for 32 and 64 bit SoCs, meaning we don't have
> > to maintain two separate code-paths for basically the same
> > functionality (assuming that NFC_CMD_N2M is not completely broken on
> > the 32-bit SoCs, we just don't know how to use it yet).
> >
> All right. I am also waiting for the answer.
do you have any update on this?

Martin