Hi,
On Fri, Sep 05, 2014 at 02:32:16PM -0700, Paul E. McKenney wrote:
> On Thu, Sep 04, 2014 at 03:04:03PM -0500, Felipe Balbi wrote:
> > Hi,
> >
> > On Thu, Sep 04, 2014 at 02:25:35PM -0500, Felipe Balbi wrote:
> > > On Thu, Sep 04, 2014 at 12:16:42PM -0700, Paul E. McKenney wrote:
> > > > On Thu, Sep 04, 2014 at 01:40:21PM -0500, Felipe Balbi wrote:
> > > > > Hi,
> > > > >
> > > > > I keep triggering the following Oops with -rc3 when writing to the mass
> > > > > storage gadget driver:
> > > >
> > > > v3.17-rc3, correct?
> > >
> > > yup, as in subject ;-)
> > >
> > > > I take it that the test passes on some earlier version?
> > >
> > > about to test v3.14.17.
> >
> > coudln't get v3.14 working on this board but at least v3.16 is also
> > affected except that on now it happened during boot, I didn't even need
> > to run my test:
> >
> > [ 17.438195] Unable to handle kernel paging request at virtual address ffffffff
> > [ 17.446109] pgd = ec360000
> > [ 17.448947] [ffffffff] *pgd=ae7f6821, *pte=00000000, *ppte=00000000
> > [ 17.455639] Internal error: Oops: 17 [#1] SMP ARM
> > [ 17.460578] Modules linked in: dwc3(+) udc_core lis3lv02d_i2c lis3lv02d input_polldev dwc3_omap matrix_keypad
> > [ 17.471060] CPU: 0 PID: 1381 Comm: accounts-daemon Tainted: G W 3.16.0-00005-g8a6cdb4 #811
> > [ 17.480735] task: ed716040 ti: ec026000 task.ti: ec026000
> > [ 17.486405] PC is at find_get_entry+0x7c/0x128
> > [ 17.491070] LR is at 0xfffffffa
> > [ 17.494364] pc : [<c0110b4c>] lr : [<fffffffa>] psr: a0000013
> > [ 17.494364] sp : ec027dc8 ip : 00000000 fp : ec027dfc
> > [ 17.506384] r10: c0c6f6bc r9 : 00000005 r8 : ecdf22f8
> > [ 17.511860] r7 : ec026008 r6 : 00000001 r5 : 00000000 r4 : 00000000
> > [ 17.518705] r3 : ec027db4 r2 : 00000000 r1 : 00000005 r0 : ffffffff
> > [ 17.525526] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
> > [ 17.533007] Control: 10c5387d Table: ac360059 DAC: 00000015
> > [ 17.539020] Process accounts-daemon (pid: 1381, stack limit = 0xec026248)
> > [ 17.546151] Stack: (0xec027dc8 to 0xec028000)
> > [ 17.550710] 7dc0: 00000000 00000000 c0110ad0 ecdf0b80 00000000 ecdf22f4
> > [ 17.559259] 7de0: ecdf22f4 00000000 00000005 00000000 ec027e34 ec027e00 c0111874 c0110adc
> > [ 17.567824] 7e00: ecdf0b80 c03565b4 ed7165f8 ec3dddf0 ecdf22f4 00000005 ec3ddd00 00000001
> > [ 17.576385] 7e20: ecdf21a0 00000000 ec027ebc ec027e38 c0112978 c0111844 00000000 c06af938
> > [ 17.584950] 7e40: ecdf0b70 ecdf0b70 ec027e6c ec027e58 00000005 00000006 00000b80 ecdf0b70
> > [ 17.593514] 7e60: 00000000 c0163264 ec3dddf0 ec027ee8 ec027ed4 00000b80 ec027eac ec027e88
> > [ 17.602087] 7e80: c0178d98 c0356590 00000000 00000000 00020000 00005b80 00000000 ec027f78
> > [ 17.610653] 7ea0: ec3ddd00 ed716040 b6cab018 00000000 ec027f44 ec027ec0 c0163264 c0112780
> > [ 17.619202] 7ec0: 00000180 00000180 ec027efc b6cab018 00000180 00000000 00000000 00000180
> > [ 17.627772] 7ee0: ec027ecc 00000001 ec3ddd00 00000000 00000000 00000000 ed716040 00000000
> > [ 17.636371] 7f00: 00000000 00000000 00005b80 00000000 00000180 00000000 00000000 00000000
> > [ 17.644946] 7f20: b6cab018 ec3ddd00 b6cab018 ec027f78 ec3ddd00 00000180 ec027f74 ec027f48
> > [ 17.653524] 7f40: c0163a6c c01631cc b6cab018 00000000 00005b80 00000000 ec3ddd03 ec3ddd00
> > [ 17.662085] 7f60: 00000180 b6cab018 ec027fa4 ec027f78 c0164198 c01639e0 00005b80 00000000
> > [ 17.670658] 7f80: be91badc be91ba50 00044a00 00000003 c000f044 ec026000 00000000 ec027fa8
> > [ 17.679222] 7fa0: c000edc0 c0164158 be91badc be91ba50 00000008 b6cab018 00000180 be91ba38
> > [ 17.687794] 7fc0: be91badc be91ba50 00044a00 00000003 be91bbac b6cab008 00000000 00000000
> > [ 17.696370] 7fe0: 00000020 be91ba40 b6c78e8c b6c78ea8 60000010 00000008 ae7f6821 ae7f6c21
> > [ 17.704956] [<c0110b4c>] (find_get_entry) from [<c0111874>] (pagecache_get_page+0x3c/0x1f4)
> > [ 17.713687] [<c0111874>] (pagecache_get_page) from [<c0112978>] (generic_file_read_iter+0x204/0x794)
> > [ 17.723259] [<c0112978>] (generic_file_read_iter) from [<c0163264>] (new_sync_read+0xa4/0xcc)
> > [ 17.732185] [<c0163264>] (new_sync_read) from [<c0163a6c>] (vfs_read+0x98/0x158)
> > [ 17.739945] [<c0163a6c>] (vfs_read) from [<c0164198>] (SyS_read+0x4c/0xa0)
> > [ 17.747149] [<c0164198>] (SyS_read) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
> > [ 17.754994] Code: e1a01009 eb08ffa9 e3500000 0a00001f (e5904000)
> > [ 17.761476] ---[ end trace 49c4ed35a1c01157 ]---
> >
> > It seems to be a difficult-to-reproduce race though. On a second boot it
> > didn't die during boot, but died with my USB test case. Unfortunately,
> > the platform I'm using is pretty new and only goes as far back as v3.16
> > (which I had to backport 11 patches to get it to boot good enough for
> > this test).
> >
> > I wonder if a corrupt file system could cause such problems... I keep
> > seeing EXT4 errors every now and again; considering that this dies in a
> > path through VFS, I wonder...
>
> I recall hearing of similar things in the past, but must defer to the
> FS/VFS experts on this one.
resurrecting this thread. I'm facing the same issues with a brand new
filesystem mounted through NFS. The way to reproduce is the same though:
using g_mass_storage with either tmpfs or mmc as backing store.
However it seems to die much more frequently than before. I can
reproduce all the time. It's definitely not a problem with my board as I
have two boards with different SoCs (ARM Cortex A8 and ARM Cortex A9)
with two different USB peripheral controllers (MUSB and DWC3), using the
same rootfs and they die the exact same way no matter if I use tmpfs or
MMC as backing store.
Adding a few more folks here.
--
balbi
Hi,
On Wed, Oct 08, 2014 at 12:13:22PM -0500, Felipe Balbi wrote:
> On Fri, Sep 05, 2014 at 02:32:16PM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 04, 2014 at 03:04:03PM -0500, Felipe Balbi wrote:
> > > Hi,
> > >
> > > On Thu, Sep 04, 2014 at 02:25:35PM -0500, Felipe Balbi wrote:
> > > > On Thu, Sep 04, 2014 at 12:16:42PM -0700, Paul E. McKenney wrote:
> > > > > On Thu, Sep 04, 2014 at 01:40:21PM -0500, Felipe Balbi wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I keep triggering the following Oops with -rc3 when writing to the mass
> > > > > > storage gadget driver:
> > > > >
> > > > > v3.17-rc3, correct?
> > > >
> > > > yup, as in subject ;-)
> > > >
> > > > > I take it that the test passes on some earlier version?
> > > >
> > > > about to test v3.14.17.
> > >
> > > coudln't get v3.14 working on this board but at least v3.16 is also
> > > affected except that on now it happened during boot, I didn't even need
> > > to run my test:
> > >
> > > [ 17.438195] Unable to handle kernel paging request at virtual address ffffffff
> > > [ 17.446109] pgd = ec360000
> > > [ 17.448947] [ffffffff] *pgd=ae7f6821, *pte=00000000, *ppte=00000000
> > > [ 17.455639] Internal error: Oops: 17 [#1] SMP ARM
> > > [ 17.460578] Modules linked in: dwc3(+) udc_core lis3lv02d_i2c lis3lv02d input_polldev dwc3_omap matrix_keypad
> > > [ 17.471060] CPU: 0 PID: 1381 Comm: accounts-daemon Tainted: G W 3.16.0-00005-g8a6cdb4 #811
> > > [ 17.480735] task: ed716040 ti: ec026000 task.ti: ec026000
> > > [ 17.486405] PC is at find_get_entry+0x7c/0x128
> > > [ 17.491070] LR is at 0xfffffffa
> > > [ 17.494364] pc : [<c0110b4c>] lr : [<fffffffa>] psr: a0000013
> > > [ 17.494364] sp : ec027dc8 ip : 00000000 fp : ec027dfc
> > > [ 17.506384] r10: c0c6f6bc r9 : 00000005 r8 : ecdf22f8
> > > [ 17.511860] r7 : ec026008 r6 : 00000001 r5 : 00000000 r4 : 00000000
> > > [ 17.518705] r3 : ec027db4 r2 : 00000000 r1 : 00000005 r0 : ffffffff
> > > [ 17.525526] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
> > > [ 17.533007] Control: 10c5387d Table: ac360059 DAC: 00000015
> > > [ 17.539020] Process accounts-daemon (pid: 1381, stack limit = 0xec026248)
> > > [ 17.546151] Stack: (0xec027dc8 to 0xec028000)
> > > [ 17.550710] 7dc0: 00000000 00000000 c0110ad0 ecdf0b80 00000000 ecdf22f4
> > > [ 17.559259] 7de0: ecdf22f4 00000000 00000005 00000000 ec027e34 ec027e00 c0111874 c0110adc
> > > [ 17.567824] 7e00: ecdf0b80 c03565b4 ed7165f8 ec3dddf0 ecdf22f4 00000005 ec3ddd00 00000001
> > > [ 17.576385] 7e20: ecdf21a0 00000000 ec027ebc ec027e38 c0112978 c0111844 00000000 c06af938
> > > [ 17.584950] 7e40: ecdf0b70 ecdf0b70 ec027e6c ec027e58 00000005 00000006 00000b80 ecdf0b70
> > > [ 17.593514] 7e60: 00000000 c0163264 ec3dddf0 ec027ee8 ec027ed4 00000b80 ec027eac ec027e88
> > > [ 17.602087] 7e80: c0178d98 c0356590 00000000 00000000 00020000 00005b80 00000000 ec027f78
> > > [ 17.610653] 7ea0: ec3ddd00 ed716040 b6cab018 00000000 ec027f44 ec027ec0 c0163264 c0112780
> > > [ 17.619202] 7ec0: 00000180 00000180 ec027efc b6cab018 00000180 00000000 00000000 00000180
> > > [ 17.627772] 7ee0: ec027ecc 00000001 ec3ddd00 00000000 00000000 00000000 ed716040 00000000
> > > [ 17.636371] 7f00: 00000000 00000000 00005b80 00000000 00000180 00000000 00000000 00000000
> > > [ 17.644946] 7f20: b6cab018 ec3ddd00 b6cab018 ec027f78 ec3ddd00 00000180 ec027f74 ec027f48
> > > [ 17.653524] 7f40: c0163a6c c01631cc b6cab018 00000000 00005b80 00000000 ec3ddd03 ec3ddd00
> > > [ 17.662085] 7f60: 00000180 b6cab018 ec027fa4 ec027f78 c0164198 c01639e0 00005b80 00000000
> > > [ 17.670658] 7f80: be91badc be91ba50 00044a00 00000003 c000f044 ec026000 00000000 ec027fa8
> > > [ 17.679222] 7fa0: c000edc0 c0164158 be91badc be91ba50 00000008 b6cab018 00000180 be91ba38
> > > [ 17.687794] 7fc0: be91badc be91ba50 00044a00 00000003 be91bbac b6cab008 00000000 00000000
> > > [ 17.696370] 7fe0: 00000020 be91ba40 b6c78e8c b6c78ea8 60000010 00000008 ae7f6821 ae7f6c21
> > > [ 17.704956] [<c0110b4c>] (find_get_entry) from [<c0111874>] (pagecache_get_page+0x3c/0x1f4)
> > > [ 17.713687] [<c0111874>] (pagecache_get_page) from [<c0112978>] (generic_file_read_iter+0x204/0x794)
> > > [ 17.723259] [<c0112978>] (generic_file_read_iter) from [<c0163264>] (new_sync_read+0xa4/0xcc)
> > > [ 17.732185] [<c0163264>] (new_sync_read) from [<c0163a6c>] (vfs_read+0x98/0x158)
> > > [ 17.739945] [<c0163a6c>] (vfs_read) from [<c0164198>] (SyS_read+0x4c/0xa0)
> > > [ 17.747149] [<c0164198>] (SyS_read) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
> > > [ 17.754994] Code: e1a01009 eb08ffa9 e3500000 0a00001f (e5904000)
> > > [ 17.761476] ---[ end trace 49c4ed35a1c01157 ]---
> > >
> > > It seems to be a difficult-to-reproduce race though. On a second boot it
> > > didn't die during boot, but died with my USB test case. Unfortunately,
> > > the platform I'm using is pretty new and only goes as far back as v3.16
> > > (which I had to backport 11 patches to get it to boot good enough for
> > > this test).
> > >
> > > I wonder if a corrupt file system could cause such problems... I keep
> > > seeing EXT4 errors every now and again; considering that this dies in a
> > > path through VFS, I wonder...
> >
> > I recall hearing of similar things in the past, but must defer to the
> > FS/VFS experts on this one.
>
> resurrecting this thread. I'm facing the same issues with a brand new
> filesystem mounted through NFS. The way to reproduce is the same though:
> using g_mass_storage with either tmpfs or mmc as backing store.
>
> However it seems to die much more frequently than before. I can
> reproduce all the time. It's definitely not a problem with my board as I
> have two boards with different SoCs (ARM Cortex A8 and ARM Cortex A9)
> with two different USB peripheral controllers (MUSB and DWC3), using the
> same rootfs and they die the exact same way no matter if I use tmpfs or
> MMC as backing store.
>
> Adding a few more folks here.
alright, first stable kernel with Cortex A8 was v3.14. All other kernel
versions die starting with v3.15 to today's Linus. I'll start bisecting
now.
--
balbi
Hi,
On Wed, Oct 08, 2014 at 12:57:07PM -0500, Felipe Balbi wrote:
[ snip ]
> > > > It seems to be a difficult-to-reproduce race though. On a second boot it
> > > > didn't die during boot, but died with my USB test case. Unfortunately,
> > > > the platform I'm using is pretty new and only goes as far back as v3.16
> > > > (which I had to backport 11 patches to get it to boot good enough for
> > > > this test).
> > > >
> > > > I wonder if a corrupt file system could cause such problems... I keep
> > > > seeing EXT4 errors every now and again; considering that this dies in a
> > > > path through VFS, I wonder...
> > >
> > > I recall hearing of similar things in the past, but must defer to the
> > > FS/VFS experts on this one.
> >
> > resurrecting this thread. I'm facing the same issues with a brand new
> > filesystem mounted through NFS. The way to reproduce is the same though:
> > using g_mass_storage with either tmpfs or mmc as backing store.
> >
> > However it seems to die much more frequently than before. I can
> > reproduce all the time. It's definitely not a problem with my board as I
> > have two boards with different SoCs (ARM Cortex A8 and ARM Cortex A9)
> > with two different USB peripheral controllers (MUSB and DWC3), using the
> > same rootfs and they die the exact same way no matter if I use tmpfs or
> > MMC as backing store.
> >
> > Adding a few more folks here.
>
> alright, first stable kernel with Cortex A8 was v3.14. All other kernel
> versions die starting with v3.15 to today's Linus. I'll start bisecting
> now.
Finally bisected it down to commit 139e561660fe11e0fc35e142a800df3dd7d03e9d
(lib: radix_tree: tree node interface). Here's full bisect log:
git bisect start
# good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
git bisect good 455c6fdbd219161bd09b1165f11699d6d73de11c
# bad: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
git bisect bad 1860e379875dfe7271c649058aeddffe5afd9d0d
# bad: [74a475acea49459721ae4b062d3da68c74259009] SubmittingPatches: add style recommendation to use imperative descriptions
git bisect bad 74a475acea49459721ae4b062d3da68c74259009
# good: [c12e69c6aaf785fd307d05cb6f36ca0e7577ead7] Merge tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good c12e69c6aaf785fd307d05cb6f36ca0e7577ead7
# good: [0fc31966035d7a540c011b6c967ce8eae1db121b] Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
git bisect good 0fc31966035d7a540c011b6c967ce8eae1db121b
# good: [bdfc7cbdeef8cadba0e5793079ac0130b8e2220c] Merge branch 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr
git bisect good bdfc7cbdeef8cadba0e5793079ac0130b8e2220c
# good: [0f1b1e6d73cb989ce2c071edc57deade3b084dfe] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
git bisect good 0f1b1e6d73cb989ce2c071edc57deade3b084dfe
# good: [181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da] ixgbe: remove redundant if clause from PTP work
git bisect good 181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da
# good: [59ecc26004e77e100c700b1d0da7502b0fdadb46] Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect good 59ecc26004e77e100c700b1d0da7502b0fdadb46
# good: [2b665e276c15ba7d9fc8cdd16931883a51ed13e4] fs/direct-io.c: remove redundant comparison
git bisect good 2b665e276c15ba7d9fc8cdd16931883a51ed13e4
# bad: [f412c97abef71026d8192ca8efca231f1e3906b3] mm, hugetlb: mark some bootstrap functions as __init
git bisect bad f412c97abef71026d8192ca8efca231f1e3906b3
# good: [4e35f483850ba46b838adfd312b3052416e15204] mm, hugetlb: use vma_resv_map() map types
git bisect good 4e35f483850ba46b838adfd312b3052416e15204
# good: [6dbaf22ce1f1dfba33313198eb5bd989ae76dd87] mm: shmem: save one radix tree lookup when truncating swapped pages
git bisect good 6dbaf22ce1f1dfba33313198eb5bd989ae76dd87
# good: [91b0abe36a7b2b3b02d7500925a5f8455334f0e5] mm + fs: store shadow entries in page cache
git bisect good 91b0abe36a7b2b3b02d7500925a5f8455334f0e5
# bad: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface
git bisect bad 139e561660fe11e0fc35e142a800df3dd7d03e9d
# good: [a528910e12ec7ee203095eb1711468a66b9b60b0] mm: thrash detection-based file cache sizing
git bisect good a528910e12ec7ee203095eb1711468a66b9b60b0
# first bad commit: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface
I tried reverting that commit on v3.15 but it's non-trivial; I'll leave
that for tomorrow. Meanwhile, adding folks involved with that commit to
Cc list and another backtrace for reference:
[ 113.696647] Unable to handle kernel paging request at virtual address ffffffff
[ 113.704370] pgd = c0004000
[ 113.707276] [ffffffff] *pgd=9fef6821, *pte=00000000, *ppte=00000000
[ 113.713998] Internal error: Oops: 17 [#1] SMP ARM
[ 113.718912] Modules linked in: g_mass_storage usb_f_mass_storage libcomposite configfs musb_dsps musb_hdrc musb_am335x
[ 113.730144] CPU: 0 PID: 1368 Comm: file-storage Not tainted 3.17.0-02899-g748eb79 #239
[ 113.738410] task: de606e00 ti: dd0ba000 task.ti: dd0ba000
[ 113.744060] PC is at find_get_entry+0x64/0x100
[ 113.748700] LR is at 0xfffffffa
[ 113.751978] pc : [<c01065b4>] lr : [<fffffffa>] psr: a00f0013
[ 113.751978] sp : dd0bbba0 ip : 00000000 fp : dd0bbbd4
[ 113.763962] r10: c0665100 r9 : 00001000 r8 : 0000001a
[ 113.769415] r7 : dd0ee9b8 r6 : 00000001 r5 : 00000000 r4 : dd0ee880
[ 113.776228] r3 : dd0bbb8c r2 : 00000000 r1 : 0000001a r0 : ffffffff
[ 113.783044] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 113.790674] Control: 10c5387d Table: 9e210019 DAC: 00000015
[ 113.796672] Process file-storage (pid: 1368, stack limit = 0xdd0ba248)
[ 113.803486] Stack: (0xdd0bbba0 to 0xdd0bc000)
[ 113.808038] bba0: 00000000 00000000 c0106550 00017508 00000002 dd0ee880 dd0ee9b4 0000001a
[ 113.816578] bbc0: 00001000 00000000 dd0bbbf4 dd0bbbd8 c010716c c010655c 00013ef0 dd0ee880
[ 113.825118] bbe0: dd0bbda4 00000003 dd0bbc6c dd0bbbf8 c011df94 c0107150 dd0bbc2c c0106b9c
[ 113.833657] bc00: c0089a3c c0089328 00000001 c0107080 00000002 dd0bbcc0 000000d0 00000000
[ 113.842197] bc20: 0001a000 00000000 00000000 dd0ee9b4 0000001a c011e74c dd0bbc94 dd0bbc48
[ 113.850736] bc40: c011beec 00001000 dd0bbda4 dd0ee9b4 00001000 00000000 00001000 c0665100
[ 113.859276] bc60: dd0bbc94 dd0bbc70 c011e74c c011df08 000200da 00000000 00001000 dd0bbda4
[ 113.867816] bc80: dd0ee9b4 00001000 dd0bbcf4 dd0bbc98 c0106b10 c011e700 00001000 00000001
[ 113.876356] bca0: dd0bbcc0 dd0bbcc4 dd0ba000 00000001 de60ee40 00002000 0001a000 00000000
[ 113.884896] bcc0: dfe71ac0 c00a3b60 54355ca1 00004000 de60ee40 00000000 dd0bbdb8 dd0ee9b4
[ 113.893436] bce0: dd0ee880 ffffffff dd0bbd5c dd0bbcf8 c0108c6c c0106a68 dd0bbd5c dd0bbd08
[ 113.901975] bd00: c064b790 c0089c48 00000001 dd0ba038 c0108f70 c0089328 00000001 c0108f7c
[ 113.910515] bd20: dd0bbda4 de606e00 00018000 00000000 dd0bbd5c dd0bbdb8 dd0ee920 dd0bbda4
[ 113.919055] bd40: de60ee40 de606e00 dd0e5000 de664a00 dd0bbd8c dd0bbd60 c0108f7c c0108a24
[ 113.927595] bd60: c008c410 c0089fd0 00000001 00000000 00018000 00000000 dd0bbe80 de60ee40
[ 113.936134] bd80: dd0bbe14 dd0bbd90 c014c920 c0108f40 00004000 00000001 00000001 de274000
[ 113.944674] bda0: 00004000 00000003 00002000 00002000 dd0bbd9c 00000001 de60ee40 00000000
[ 113.953214] bdc0: 00000000 00000000 de606e00 00000000 00000000 00000000 00018000 00000000
[ 113.961753] bde0: 00004000 00000000 00000000 00000000 de274000 de60ee40 de274000 dd0bbe80
[ 113.970293] be00: 00004000 de6ce9c0 dd0bbe44 dd0bbe18 c014d1c8 c014c888 00000002 de6ce9c0
[ 113.978833] be20: 00004000 00000000 00000000 00008000 de6ce9c0 dd0e5000 dd0bbeb4 dd0bbe48
[ 113.987373] be40: bf059cc4 c014d120 00000000 dd0bbe9c dd0bbe68 bf05a04c 19000000 00000000
[ 113.995912] be60: dd0ba000 00000000 00000000 6f48202c 00018000 00000000 00020000 00000000
[ 114.004452] be80: 00018000 00000000 00000000 de664a00 de6ce9c0 00000000 de664a38 de664a00
[ 114.012992] bea0: dd0ba038 de664a7c dd0bbf24 dd0bbeb8 bf05a938 bf059980 00000001 c00899dc
[ 114.021531] bec0: a00f0013 de2e3bd4 00000000 00052000 00000000 dd0bbee0 c0089c50 c0089a70
[ 114.030071] bee0: dd0bbf04 dd0bbef0 c064f3a4 de6ce840 00000000 de664a00 bf05a244 de6ce840
[ 114.038611] bf00: 00000000 de664a00 bf05a244 00000000 00000000 00000000 dd0bbfac dd0bbf28
[ 114.047151] bf20: c0065bdc bf05a250 c0089c50 00000000 dd0bbf54 de664a00 00000000 00000000
[ 114.055690] bf40: dead4ead ffffffff ffffffff c0a8a238 00000000 00000000 c08070f8 dd0bbf5c
[ 114.064230] bf60: dd0bbf5c 00000000 00000000 dead4ead ffffffff ffffffff c0a8a238 00000000
[ 114.072770] bf80: 00000000 c08070f8 dd0bbf88 dd0bbf88 de6ce840 c0065af8 00000000 00000000
[ 114.081310] bfa0: 00000000 dd0bbfb0 c000eea8 c0065b04 00000000 00000000 00000000 00000000
[ 114.089850] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 114.098389] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 0001086e 00001a02
[ 114.106944] [<c01065b4>] (find_get_entry) from [<c010716c>] (find_lock_entry+0x28/0x7c)
[ 114.115316] [<c010716c>] (find_lock_entry) from [<c011df94>] (shmem_getpage_gfp+0x98/0x7f8)
[ 114.124042] [<c011df94>] (shmem_getpage_gfp) from [<c011e74c>] (shmem_write_begin+0x58/0x94)
[ 114.132856] [<c011e74c>] (shmem_write_begin) from [<c0106b10>] (generic_perform_write+0xb4/0x1c8)
[ 114.142124] [<c0106b10>] (generic_perform_write) from [<c0108c6c>] (__generic_file_write_iter+0x254/0x51c)
[ 114.152208] [<c0108c6c>] (__generic_file_write_iter) from [<c0108f7c>] (generic_file_write_iter+0x48/0xdc)
[ 114.162298] [<c0108f7c>] (generic_file_write_iter) from [<c014c920>] (new_sync_write+0xa4/0xcc)
[ 114.171386] [<c014c920>] (new_sync_write) from [<c014d1c8>] (vfs_write+0xb4/0x1c0)
[ 114.179334] [<c014d1c8>] (vfs_write) from [<bf059cc4>] (do_write+0x350/0x4b8 [usb_f_mass_storage])
[ 114.188719] [<bf059cc4>] (do_write [usb_f_mass_storage]) from [<bf05a938>] (fsg_main_thread+0x6f4/0x13f8 [usb_f_mass_storage])
[ 114.200636] [<bf05a938>] (fsg_main_thread [usb_f_mass_storage]) from [<c0065bdc>] (kthread+0xe4/0x100)
[ 114.210368] [<c0065bdc>] (kthread) from [<c000eea8>] (ret_from_fork+0x14/0x20)
[ 114.217914] Code: e1a01008 eb08abbe e3500000 0a00001b (e5904000)
[ 114.224529] ---[ end trace afb7e71d4b71be98 ]---
for those who are coming by late, the problem happens when I use
g_mass_storage with either Cortex A8 or Cortex A9 with two different USB
peripheral controllers using either tmpfs or mmc as backing store.
--
balbi
Hi Felipe,
On Wed, Oct 08, 2014 at 04:29:38PM -0500, Felipe Balbi wrote:
> Finally bisected it down to commit 139e561660fe11e0fc35e142a800df3dd7d03e9d
> (lib: radix_tree: tree node interface). Here's full bisect log:
>
> git bisect start
> # good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
> git bisect good 455c6fdbd219161bd09b1165f11699d6d73de11c
> # bad: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
> git bisect bad 1860e379875dfe7271c649058aeddffe5afd9d0d
> # bad: [74a475acea49459721ae4b062d3da68c74259009] SubmittingPatches: add style recommendation to use imperative descriptions
> git bisect bad 74a475acea49459721ae4b062d3da68c74259009
> # good: [c12e69c6aaf785fd307d05cb6f36ca0e7577ead7] Merge tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> git bisect good c12e69c6aaf785fd307d05cb6f36ca0e7577ead7
> # good: [0fc31966035d7a540c011b6c967ce8eae1db121b] Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
> git bisect good 0fc31966035d7a540c011b6c967ce8eae1db121b
> # good: [bdfc7cbdeef8cadba0e5793079ac0130b8e2220c] Merge branch 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr
> git bisect good bdfc7cbdeef8cadba0e5793079ac0130b8e2220c
> # good: [0f1b1e6d73cb989ce2c071edc57deade3b084dfe] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
> git bisect good 0f1b1e6d73cb989ce2c071edc57deade3b084dfe
> # good: [181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da] ixgbe: remove redundant if clause from PTP work
> git bisect good 181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da
> # good: [59ecc26004e77e100c700b1d0da7502b0fdadb46] Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
> git bisect good 59ecc26004e77e100c700b1d0da7502b0fdadb46
> # good: [2b665e276c15ba7d9fc8cdd16931883a51ed13e4] fs/direct-io.c: remove redundant comparison
> git bisect good 2b665e276c15ba7d9fc8cdd16931883a51ed13e4
> # bad: [f412c97abef71026d8192ca8efca231f1e3906b3] mm, hugetlb: mark some bootstrap functions as __init
> git bisect bad f412c97abef71026d8192ca8efca231f1e3906b3
> # good: [4e35f483850ba46b838adfd312b3052416e15204] mm, hugetlb: use vma_resv_map() map types
> git bisect good 4e35f483850ba46b838adfd312b3052416e15204
> # good: [6dbaf22ce1f1dfba33313198eb5bd989ae76dd87] mm: shmem: save one radix tree lookup when truncating swapped pages
> git bisect good 6dbaf22ce1f1dfba33313198eb5bd989ae76dd87
> # good: [91b0abe36a7b2b3b02d7500925a5f8455334f0e5] mm + fs: store shadow entries in page cache
> git bisect good 91b0abe36a7b2b3b02d7500925a5f8455334f0e5
> # bad: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface
> git bisect bad 139e561660fe11e0fc35e142a800df3dd7d03e9d
> # good: [a528910e12ec7ee203095eb1711468a66b9b60b0] mm: thrash detection-based file cache sizing
> git bisect good a528910e12ec7ee203095eb1711468a66b9b60b0
> # first bad commit: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface
>
> I tried reverting that commit on v3.15 but it's non-trivial; I'll leave
> that for tomorrow. Meanwhile, adding folks involved with that commit to
> Cc list and another backtrace for reference:
>
> [ 113.696647] Unable to handle kernel paging request at virtual address ffffffff
> [ 113.704370] pgd = c0004000
> [ 113.707276] [ffffffff] *pgd=9fef6821, *pte=00000000, *ppte=00000000
> [ 113.713998] Internal error: Oops: 17 [#1] SMP ARM
> [ 113.718912] Modules linked in: g_mass_storage usb_f_mass_storage libcomposite configfs musb_dsps musb_hdrc musb_am335x
> [ 113.730144] CPU: 0 PID: 1368 Comm: file-storage Not tainted 3.17.0-02899-g748eb79 #239
> [ 113.738410] task: de606e00 ti: dd0ba000 task.ti: dd0ba000
> [ 113.744060] PC is at find_get_entry+0x64/0x100
Could you please provide the disassembly of that function?
I'm thinking it's not the slot pointer itself that's bad, because
__radix_tree_lookup() dereferences that to test if it's populated
before returning it, and slot life-time is guaranteed by RCU.
That would only leave garbage in the slot itself, crashing during
page_cache_get_speculative().
I'll keep staring at this change, but nothing stands out to me yet.
Thanks,
Johannes
> [ 113.748700] LR is at 0xfffffffa
> [ 113.751978] pc : [<c01065b4>] lr : [<fffffffa>] psr: a00f0013
> [ 113.751978] sp : dd0bbba0 ip : 00000000 fp : dd0bbbd4
> [ 113.763962] r10: c0665100 r9 : 00001000 r8 : 0000001a
> [ 113.769415] r7 : dd0ee9b8 r6 : 00000001 r5 : 00000000 r4 : dd0ee880
> [ 113.776228] r3 : dd0bbb8c r2 : 00000000 r1 : 0000001a r0 : ffffffff
> [ 113.783044] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
> [ 113.790674] Control: 10c5387d Table: 9e210019 DAC: 00000015
> [ 113.796672] Process file-storage (pid: 1368, stack limit = 0xdd0ba248)
> [ 113.803486] Stack: (0xdd0bbba0 to 0xdd0bc000)
> [ 113.808038] bba0: 00000000 00000000 c0106550 00017508 00000002 dd0ee880 dd0ee9b4 0000001a
> [ 113.816578] bbc0: 00001000 00000000 dd0bbbf4 dd0bbbd8 c010716c c010655c 00013ef0 dd0ee880
> [ 113.825118] bbe0: dd0bbda4 00000003 dd0bbc6c dd0bbbf8 c011df94 c0107150 dd0bbc2c c0106b9c
> [ 113.833657] bc00: c0089a3c c0089328 00000001 c0107080 00000002 dd0bbcc0 000000d0 00000000
> [ 113.842197] bc20: 0001a000 00000000 00000000 dd0ee9b4 0000001a c011e74c dd0bbc94 dd0bbc48
> [ 113.850736] bc40: c011beec 00001000 dd0bbda4 dd0ee9b4 00001000 00000000 00001000 c0665100
> [ 113.859276] bc60: dd0bbc94 dd0bbc70 c011e74c c011df08 000200da 00000000 00001000 dd0bbda4
> [ 113.867816] bc80: dd0ee9b4 00001000 dd0bbcf4 dd0bbc98 c0106b10 c011e700 00001000 00000001
> [ 113.876356] bca0: dd0bbcc0 dd0bbcc4 dd0ba000 00000001 de60ee40 00002000 0001a000 00000000
> [ 113.884896] bcc0: dfe71ac0 c00a3b60 54355ca1 00004000 de60ee40 00000000 dd0bbdb8 dd0ee9b4
> [ 113.893436] bce0: dd0ee880 ffffffff dd0bbd5c dd0bbcf8 c0108c6c c0106a68 dd0bbd5c dd0bbd08
> [ 113.901975] bd00: c064b790 c0089c48 00000001 dd0ba038 c0108f70 c0089328 00000001 c0108f7c
> [ 113.910515] bd20: dd0bbda4 de606e00 00018000 00000000 dd0bbd5c dd0bbdb8 dd0ee920 dd0bbda4
> [ 113.919055] bd40: de60ee40 de606e00 dd0e5000 de664a00 dd0bbd8c dd0bbd60 c0108f7c c0108a24
> [ 113.927595] bd60: c008c410 c0089fd0 00000001 00000000 00018000 00000000 dd0bbe80 de60ee40
> [ 113.936134] bd80: dd0bbe14 dd0bbd90 c014c920 c0108f40 00004000 00000001 00000001 de274000
> [ 113.944674] bda0: 00004000 00000003 00002000 00002000 dd0bbd9c 00000001 de60ee40 00000000
> [ 113.953214] bdc0: 00000000 00000000 de606e00 00000000 00000000 00000000 00018000 00000000
> [ 113.961753] bde0: 00004000 00000000 00000000 00000000 de274000 de60ee40 de274000 dd0bbe80
> [ 113.970293] be00: 00004000 de6ce9c0 dd0bbe44 dd0bbe18 c014d1c8 c014c888 00000002 de6ce9c0
> [ 113.978833] be20: 00004000 00000000 00000000 00008000 de6ce9c0 dd0e5000 dd0bbeb4 dd0bbe48
> [ 113.987373] be40: bf059cc4 c014d120 00000000 dd0bbe9c dd0bbe68 bf05a04c 19000000 00000000
> [ 113.995912] be60: dd0ba000 00000000 00000000 6f48202c 00018000 00000000 00020000 00000000
> [ 114.004452] be80: 00018000 00000000 00000000 de664a00 de6ce9c0 00000000 de664a38 de664a00
> [ 114.012992] bea0: dd0ba038 de664a7c dd0bbf24 dd0bbeb8 bf05a938 bf059980 00000001 c00899dc
> [ 114.021531] bec0: a00f0013 de2e3bd4 00000000 00052000 00000000 dd0bbee0 c0089c50 c0089a70
> [ 114.030071] bee0: dd0bbf04 dd0bbef0 c064f3a4 de6ce840 00000000 de664a00 bf05a244 de6ce840
> [ 114.038611] bf00: 00000000 de664a00 bf05a244 00000000 00000000 00000000 dd0bbfac dd0bbf28
> [ 114.047151] bf20: c0065bdc bf05a250 c0089c50 00000000 dd0bbf54 de664a00 00000000 00000000
> [ 114.055690] bf40: dead4ead ffffffff ffffffff c0a8a238 00000000 00000000 c08070f8 dd0bbf5c
> [ 114.064230] bf60: dd0bbf5c 00000000 00000000 dead4ead ffffffff ffffffff c0a8a238 00000000
> [ 114.072770] bf80: 00000000 c08070f8 dd0bbf88 dd0bbf88 de6ce840 c0065af8 00000000 00000000
> [ 114.081310] bfa0: 00000000 dd0bbfb0 c000eea8 c0065b04 00000000 00000000 00000000 00000000
> [ 114.089850] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 114.098389] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 0001086e 00001a02
> [ 114.106944] [<c01065b4>] (find_get_entry) from [<c010716c>] (find_lock_entry+0x28/0x7c)
> [ 114.115316] [<c010716c>] (find_lock_entry) from [<c011df94>] (shmem_getpage_gfp+0x98/0x7f8)
> [ 114.124042] [<c011df94>] (shmem_getpage_gfp) from [<c011e74c>] (shmem_write_begin+0x58/0x94)
> [ 114.132856] [<c011e74c>] (shmem_write_begin) from [<c0106b10>] (generic_perform_write+0xb4/0x1c8)
> [ 114.142124] [<c0106b10>] (generic_perform_write) from [<c0108c6c>] (__generic_file_write_iter+0x254/0x51c)
> [ 114.152208] [<c0108c6c>] (__generic_file_write_iter) from [<c0108f7c>] (generic_file_write_iter+0x48/0xdc)
> [ 114.162298] [<c0108f7c>] (generic_file_write_iter) from [<c014c920>] (new_sync_write+0xa4/0xcc)
> [ 114.171386] [<c014c920>] (new_sync_write) from [<c014d1c8>] (vfs_write+0xb4/0x1c0)
> [ 114.179334] [<c014d1c8>] (vfs_write) from [<bf059cc4>] (do_write+0x350/0x4b8 [usb_f_mass_storage])
> [ 114.188719] [<bf059cc4>] (do_write [usb_f_mass_storage]) from [<bf05a938>] (fsg_main_thread+0x6f4/0x13f8 [usb_f_mass_storage])
> [ 114.200636] [<bf05a938>] (fsg_main_thread [usb_f_mass_storage]) from [<c0065bdc>] (kthread+0xe4/0x100)
> [ 114.210368] [<c0065bdc>] (kthread) from [<c000eea8>] (ret_from_fork+0x14/0x20)
> [ 114.217914] Code: e1a01008 eb08abbe e3500000 0a00001b (e5904000)
> [ 114.224529] ---[ end trace afb7e71d4b71be98 ]---
>
> for those who are coming by late, the problem happens when I use
> g_mass_storage with either Cortex A8 or Cortex A9 with two different USB
> peripheral controllers using either tmpfs or mmc as backing store.
>
> --
> balbi
Hi Johannes,
On Thu, Oct 09, 2014 at 12:01:38PM -0400, Johannes Weiner wrote:
> On Wed, Oct 08, 2014 at 04:29:38PM -0500, Felipe Balbi wrote:
> > Finally bisected it down to commit 139e561660fe11e0fc35e142a800df3dd7d03e9d
> > (lib: radix_tree: tree node interface). Here's full bisect log:
> >
> > git bisect start
> > # good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
> > git bisect good 455c6fdbd219161bd09b1165f11699d6d73de11c
> > # bad: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
> > git bisect bad 1860e379875dfe7271c649058aeddffe5afd9d0d
> > # bad: [74a475acea49459721ae4b062d3da68c74259009] SubmittingPatches: add style recommendation to use imperative descriptions
> > git bisect bad 74a475acea49459721ae4b062d3da68c74259009
> > # good: [c12e69c6aaf785fd307d05cb6f36ca0e7577ead7] Merge tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> > git bisect good c12e69c6aaf785fd307d05cb6f36ca0e7577ead7
> > # good: [0fc31966035d7a540c011b6c967ce8eae1db121b] Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
> > git bisect good 0fc31966035d7a540c011b6c967ce8eae1db121b
> > # good: [bdfc7cbdeef8cadba0e5793079ac0130b8e2220c] Merge branch 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr
> > git bisect good bdfc7cbdeef8cadba0e5793079ac0130b8e2220c
> > # good: [0f1b1e6d73cb989ce2c071edc57deade3b084dfe] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
> > git bisect good 0f1b1e6d73cb989ce2c071edc57deade3b084dfe
> > # good: [181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da] ixgbe: remove redundant if clause from PTP work
> > git bisect good 181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da
> > # good: [59ecc26004e77e100c700b1d0da7502b0fdadb46] Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
> > git bisect good 59ecc26004e77e100c700b1d0da7502b0fdadb46
> > # good: [2b665e276c15ba7d9fc8cdd16931883a51ed13e4] fs/direct-io.c: remove redundant comparison
> > git bisect good 2b665e276c15ba7d9fc8cdd16931883a51ed13e4
> > # bad: [f412c97abef71026d8192ca8efca231f1e3906b3] mm, hugetlb: mark some bootstrap functions as __init
> > git bisect bad f412c97abef71026d8192ca8efca231f1e3906b3
> > # good: [4e35f483850ba46b838adfd312b3052416e15204] mm, hugetlb: use vma_resv_map() map types
> > git bisect good 4e35f483850ba46b838adfd312b3052416e15204
> > # good: [6dbaf22ce1f1dfba33313198eb5bd989ae76dd87] mm: shmem: save one radix tree lookup when truncating swapped pages
> > git bisect good 6dbaf22ce1f1dfba33313198eb5bd989ae76dd87
> > # good: [91b0abe36a7b2b3b02d7500925a5f8455334f0e5] mm + fs: store shadow entries in page cache
> > git bisect good 91b0abe36a7b2b3b02d7500925a5f8455334f0e5
> > # bad: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface
> > git bisect bad 139e561660fe11e0fc35e142a800df3dd7d03e9d
> > # good: [a528910e12ec7ee203095eb1711468a66b9b60b0] mm: thrash detection-based file cache sizing
> > git bisect good a528910e12ec7ee203095eb1711468a66b9b60b0
> > # first bad commit: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface
> >
> > I tried reverting that commit on v3.15 but it's non-trivial; I'll leave
> > that for tomorrow. Meanwhile, adding folks involved with that commit to
> > Cc list and another backtrace for reference:
> >
> > [ 113.696647] Unable to handle kernel paging request at virtual address ffffffff
> > [ 113.704370] pgd = c0004000
> > [ 113.707276] [ffffffff] *pgd=9fef6821, *pte=00000000, *ppte=00000000
> > [ 113.713998] Internal error: Oops: 17 [#1] SMP ARM
> > [ 113.718912] Modules linked in: g_mass_storage usb_f_mass_storage libcomposite configfs musb_dsps musb_hdrc musb_am335x
> > [ 113.730144] CPU: 0 PID: 1368 Comm: file-storage Not tainted 3.17.0-02899-g748eb79 #239
> > [ 113.738410] task: de606e00 ti: dd0ba000 task.ti: dd0ba000
> > [ 113.744060] PC is at find_get_entry+0x64/0x100
>
> Could you please provide the disassembly of that function?
here you go. It's ARM assembly however:
Dump of assembler code for function find_get_entry:
0xc011da48 <+0>: mov r12, sp
0xc011da4c <+4>: push {r4, r5, r6, r7, r8, r9, r11, r12, lr, pc}
0xc011da50 <+8>: sub r11, r12, #4
0xc011da54 <+12>: sub sp, sp, #16
0xc011da58 <+16>: push {lr} ; (str lr, [sp, #-4]!)
0xc011da5c <+20>: bl 0xc000ef00 <__gnu_mcount_nc>
0xc011da60 <+24>: mov r6, r0
0xc011da64 <+28>: mov r7, r1
0xc011da68 <+32>: ldr r2, [pc, #520] ; 0xc011dc78 <find_get_entry+560>
0xc011da6c <+36>: mov r3, #0
0xc011da70 <+40>: mov r1, r3
0xc011da74 <+44>: str r2, [sp, #8]
0xc011da78 <+48>: str r3, [sp]
0xc011da7c <+52>: mov r2, r3
0xc011da80 <+56>: str r3, [sp, #4]
0xc011da84 <+60>: ldr r0, [pc, #496] ; 0xc011dc7c <find_get_entry+564>
0xc011da88 <+64>: mov r3, #2
0xc011da8c <+68>: bl 0xc0095f88 <lock_acquire>
0xc011da90 <+72>: bl 0xc00a7b50 <debug_lockdep_rcu_enabled>
0xc011da94 <+76>: cmp r0, #0
0xc011da98 <+80>: beq 0xc011daac <find_get_entry+100>
0xc011da9c <+84>: ldr r4, [pc, #476] ; 0xc011dc80 <find_get_entry+568>
0xc011daa0 <+88>: ldrb r3, [r4, #1]
0xc011daa4 <+92>: cmp r3, #0
0xc011daa8 <+96>: beq 0xc011dbfc <find_get_entry+436>
0xc011daac <+100>: ldr r8, [pc, #460] ; 0xc011dc80 <find_get_entry+568>
0xc011dab0 <+104>: add r6, r6, #4
0xc011dab4 <+108>: mov r5, #1
0xc011dab8 <+112>: mov r0, r6
0xc011dabc <+116>: mov r1, r7
0xc011dac0 <+120>: bl 0xc0364660 <radix_tree_lookup_slot>
0xc011dac4 <+124>: subs r9, r0, #0
0xc011dac8 <+128>: beq 0xc011dc24 <find_get_entry+476>
0xc011dacc <+132>: ldr r4, [r9]
0xc011dad0 <+136>: bl 0xc00a7b50 <debug_lockdep_rcu_enabled>
0xc011dad4 <+140>: cmp r0, #0
0xc011dad8 <+144>: beq 0xc011dae8 <find_get_entry+160>
0xc011dadc <+148>: ldrb r3, [r8, #2]
0xc011dae0 <+152>: cmp r3, #0
0xc011dae4 <+156>: beq 0xc011dbcc <find_get_entry+388>
0xc011dae8 <+160>: cmp r4, #0
0xc011daec <+164>: beq 0xc011dc24 <find_get_entry+476>
0xc011daf0 <+168>: tst r4, #3
0xc011daf4 <+172>: bne 0xc011dc4c <find_get_entry+516>
0xc011daf8 <+176>: mov r2, sp
0xc011dafc <+180>: bic r3, r2, #8128 ; 0x1fc0
0xc011db00 <+184>: bic r3, r3, #63 ; 0x3f
0xc011db04 <+188>: ldr r2, [pc, #376] ; 0xc011dc84 <find_get_entry+572>
0xc011db08 <+192>: ldr r3, [r3, #4]
0xc011db0c <+196>: and r2, r2, r3
0xc011db10 <+200>: cmp r2, #0
0xc011db14 <+204>: bne 0xc011dc68 <find_get_entry+544>
0xc011db18 <+208>: add r3, r4, #16
0xc011db1c <+212>: mcr 15, 0, r2, cr7, cr10, {5}
0xc011db20 <+216>: mov r2, #0
0xc011db24 <+220>: pld [r3]
0xc011db28 <+224>: ldrex r1, [r3]
0xc011db2c <+228>: teq r1, r2
0xc011db30 <+232>: beq 0xc011db44 <find_get_entry+252>
0xc011db34 <+236>: add r0, r1, r5
0xc011db38 <+240>: strex r12, r0, [r3]
0xc011db3c <+244>: teq r12, #0
0xc011db40 <+248>: bne 0xc011db28 <find_get_entry+224>
0xc011db44 <+252>: cmp r1, #0
0xc011db48 <+256>: beq 0xc011dab8 <find_get_entry+112>
0xc011db4c <+260>: mov r3, #0
0xc011db50 <+264>: mcr 15, 0, r3, cr7, cr10, {5}
0xc011db54 <+268>: ldr r3, [r4]
0xc011db58 <+272>: tst r3, #32768 ; 0x8000
0xc011db5c <+276>: bne 0xc011dc58 <find_get_entry+528>
0xc011db60 <+280>: ldr r3, [r9]
0xc011db64 <+284>: cmp r3, r4
0xc011db68 <+288>: bne 0xc011dc6c <find_get_entry+548>
0xc011db6c <+292>: bl 0xc00a7b50 <debug_lockdep_rcu_enabled>
0xc011db70 <+296>: cmp r0, #0
0xc011db74 <+300>: beq 0xc011db88 <find_get_entry+320>
0xc011db78 <+304>: ldr r5, [pc, #256] ; 0xc011dc80 <find_get_entry+568>
0xc011db7c <+308>: ldrb r3, [r5, #3]
0xc011db80 <+312>: cmp r3, #0
0xc011db84 <+316>: beq 0xc011dba4 <find_get_entry+348>
0xc011db88 <+320>: ldr r0, [pc, #236] ; 0xc011dc7c <find_get_entry+564>
0xc011db8c <+324>: mov r1, #1
0xc011db90 <+328>: ldr r2, [pc, #240] ; 0xc011dc88 <find_get_entry+576>
0xc011db94 <+332>: bl 0xc0096380 <lock_release>
0xc011db98 <+336>: sub sp, r11, #36 ; 0x24
0xc011db9c <+340>: mov r0, r4
0xc011dba0 <+344>: ldm sp, {r4, r5, r6, r7, r8, r9, r11, sp, pc}
0xc011dba4 <+348>: bl 0xc00aadc4 <rcu_is_watching>
0xc011dba8 <+352>: cmp r0, #0
0xc011dbac <+356>: bne 0xc011db88 <find_get_entry+320>
0xc011dbb0 <+360>: mov r3, #1
0xc011dbb4 <+364>: ldr r0, [pc, #208] ; 0xc011dc8c <find_get_entry+580>
0xc011dbb8 <+368>: ldr r1, [pc, #208] ; 0xc011dc90 <find_get_entry+584>
0xc011dbbc <+372>: ldr r2, [pc, #208] ; 0xc011dc94 <find_get_entry+588>
0xc011dbc0 <+376>: strb r3, [r5, #3]
0xc011dbc4 <+380>: bl 0xc00920cc <lockdep_rcu_suspicious>
0xc011dbc8 <+384>: b 0xc011db88 <find_get_entry+320>
0xc011dbcc <+388>: bl 0xc00a7b50 <debug_lockdep_rcu_enabled>
0xc011dbd0 <+392>: cmp r0, #0
0xc011dbd4 <+396>: beq 0xc011dae8 <find_get_entry+160>
0xc011dbd8 <+400>: bl 0xc00aadc4 <rcu_is_watching>
0xc011dbdc <+404>: cmp r0, #0
0xc011dbe0 <+408>: bne 0xc011dc2c <find_get_entry+484>
0xc011dbe4 <+412>: ldr r0, [pc, #172] ; 0xc011dc98 <find_get_entry+592>
0xc011dbe8 <+416>: mov r1, #196 ; 0xc4
0xc011dbec <+420>: ldr r2, [pc, #168] ; 0xc011dc9c <find_get_entry+596>
0xc011dbf0 <+424>: strb r5, [r8, #2]
0xc011dbf4 <+428>: bl 0xc00920cc <lockdep_rcu_suspicious>
0xc011dbf8 <+432>: b 0xc011dae8 <find_get_entry+160>
0xc011dbfc <+436>: bl 0xc00aadc4 <rcu_is_watching>
0xc011dc00 <+440>: cmp r0, #0
0xc011dc04 <+444>: bne 0xc011daac <find_get_entry+100>
0xc011dc08 <+448>: mov r3, #1
0xc011dc0c <+452>: ldr r0, [pc, #120] ; 0xc011dc8c <find_get_entry+580>
0xc011dc10 <+456>: mov r1, #844 ; 0x34c
0xc011dc14 <+460>: ldr r2, [pc, #132] ; 0xc011dca0 <find_get_entry+600>
0xc011dc18 <+464>: strb r3, [r4, #1]
0xc011dc1c <+468>: bl 0xc00920cc <lockdep_rcu_suspicious>
0xc011dc20 <+472>: b 0xc011daac <find_get_entry+100>
0xc011dc24 <+476>: mov r4, #0
0xc011dc28 <+480>: b 0xc011db6c <find_get_entry+292>
0xc011dc2c <+484>: bl 0xc00ac38c <rcu_lockdep_current_cpu_online>
0xc011dc30 <+488>: cmp r0, #0
0xc011dc34 <+492>: beq 0xc011dbe4 <find_get_entry+412>
0xc011dc38 <+496>: ldr r0, [pc, #60] ; 0xc011dc7c <find_get_entry+564>
0xc011dc3c <+500>: bl 0xc0091264 <lock_is_held>
0xc011dc40 <+504>: cmp r0, #0
0xc011dc44 <+508>: beq 0xc011dbe4 <find_get_entry+412>
0xc011dc48 <+512>: b 0xc011dae8 <find_get_entry+160>
0xc011dc4c <+516>: tst r4, #1
0xc011dc50 <+520>: beq 0xc011db6c <find_get_entry+292>
0xc011dc54 <+524>: b 0xc011dab8 <find_get_entry+112>
0xc011dc58 <+528>: mov r0, r4
0xc011dc5c <+532>: ldr r1, [pc, #64] ; 0xc011dca4 <find_get_entry+604>
0xc011dc60 <+536>: bl 0xc01254d4 <dump_page>
0xc011dc64 <+540>: ; <UNDEFINED> instruction: 0xe7f001f2
0xc011dc68 <+544>: ; <UNDEFINED> instruction: 0xe7f001f2
0xc011dc6c <+548>: mov r0, r4
0xc011dc70 <+552>: bl 0xc012db6c <put_page>
0xc011dc74 <+556>: b 0xc011dab8 <find_get_entry+112>
0xc011dc78 <+560>: andsgt sp, r1, r8, asr #20
0xc011dc7c <+564>: adcgt r2, r11, r8, lsl r2
0xc011dc80 <+568>: ldrhtgt r0, [r0], r1
0xc011dc84 <+572>: andseq pc, pc, r0, lsl #30
0xc011dc88 <+576>: andsgt sp, r1, r8, lsl #23
0xc011dc8c <+580>: addgt sp, r5, r8, lsl #5
0xc011dc90 <+584>: andeq r0, r0, sp, ror r3
0xc011dc94 <+588>: ldrdgt sp, [r5], r0
0xc011dc98 <+592>: addgt sp, r7, r8, asr #7
0xc011dc9c <+596>: addgt lr, r6, r8, lsl #17
0xc011dca0 <+600>: addgt sp, r5, r4, lsr #5
0xc011dca4 <+604>: addgt sp, r7, r4, ror #7
End of assembler dump.
> I'm thinking it's not the slot pointer itself that's bad, because
> __radix_tree_lookup() dereferences that to test if it's populated
> before returning it, and slot life-time is guaranteed by RCU.
>
> That would only leave garbage in the slot itself, crashing during
> page_cache_get_speculative().
>
> I'll keep staring at this change, but nothing stands out to me yet.
alright, it's pretty deterministic however. Always on the same test, no
matter which USB controller, no matter if backing store is RAM or MMC.
Those two undefined instructions on the disassembly caught my attention,
perhaps I'm facing a GCC bug ?
--
balbi
Hi,
On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > I'm thinking it's not the slot pointer itself that's bad, because
> > __radix_tree_lookup() dereferences that to test if it's populated
> > before returning it, and slot life-time is guaranteed by RCU.
> >
> > That would only leave garbage in the slot itself, crashing during
> > page_cache_get_speculative().
> >
> > I'll keep staring at this change, but nothing stands out to me yet.
>
> alright, it's pretty deterministic however. Always on the same test, no
> matter which USB controller, no matter if backing store is RAM or MMC.
>
> Those two undefined instructions on the disassembly caught my attention,
> perhaps I'm facing a GCC bug ?
no, probably not a GCC bug. Looking at your commit, however. Man, it
does quite many things at once. Moves code around, adds new functions by
refactoring (and changing) code, renames things, changes int offset into
unsigned ints. Should not be too difficult too to miss a bug in there.
I'll continue digging here.
--
balbi
On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> alright, it's pretty deterministic however. Always on the same test, no
> matter which USB controller, no matter if backing store is RAM or MMC.
>
> Those two undefined instructions on the disassembly caught my attention,
> perhaps I'm facing a GCC bug ?
The undefined instructions are just ARM's BUG() implementation.
But did you see the question I asked you yesterday in your other thread?
http://www.spinics.net/lists/arm-kernel/msg368634.html
Here it is again:
What GCC version are you using?
4.8.1 and 4.8.2 are known to miscompile the ARM kernel and these
find_get_entry() crashes with 0xffffffff involved smell a lot like the
earlier reports from kernels build with those compilers:
https://lkml.org/lkml/2014/6/25/456
https://lkml.org/lkml/2014/6/30/375
https://lkml.org/lkml/2014/6/30/660
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
https://lkml.org/lkml/2014/5/9/330
Also, I didn't see any public email making a definitive link between GCC
PR 58854 that Nathan pointed out in https://lkml.org/lkml/2014/6/30/660
and the earlier find_get_entry() crashes, but I just built GCC 4.8.1 and
an ARM kernel with that, and the GCC bug is clearly seen in
radix_tree_lookup_slot() which returns the pointer which
find_get_entry() is dereferencing:
<radix_tree_lookup_slot>:
e1a0c00d mov ip, sp
e92dd800 push {fp, ip, lr, pc}
e24cb004 sub fp, ip, #4
e24dd008 sub sp, sp, #8
e3a02000 mov r2, #0
e24b3010 sub r3, fp, #16
ebffffc5 bl c0176ab8 <__radix_tree_lookup>
e24bd00c sub sp, fp, #12 <--- sp moved up
e3500000 cmp r0, #0
151b0010 ldrne r0, [fp, #-16] <--- load from under sp
e89da800 ldm sp, {fp, sp, pc}
Please check your compiler to make sure it's not the same problem.
Hi,
On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > alright, it's pretty deterministic however. Always on the same test, no
> > matter which USB controller, no matter if backing store is RAM or MMC.
> >
> > Those two undefined instructions on the disassembly caught my attention,
> > perhaps I'm facing a GCC bug ?
>
> The undefined instructions are just ARM's BUG() implementation.
>
> But did you see the question I asked you yesterday in your other thread?
> http://www.spinics.net/lists/arm-kernel/msg368634.html
hmm, completely missed that, sorry. I'm using 4.8.2, will try something
else.
--
balbi
Hi,
On Thu, Oct 09, 2014 at 03:46:37PM -0500, Felipe Balbi wrote:
> On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> > On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > > alright, it's pretty deterministic however. Always on the same test, no
> > > matter which USB controller, no matter if backing store is RAM or MMC.
> > >
> > > Those two undefined instructions on the disassembly caught my attention,
> > > perhaps I'm facing a GCC bug ?
> >
> > The undefined instructions are just ARM's BUG() implementation.
> >
> > But did you see the question I asked you yesterday in your other thread?
> > http://www.spinics.net/lists/arm-kernel/msg368634.html
>
> hmm, completely missed that, sorry. I'm using 4.8.2, will try something
> else.
seems to be working fine now, thanks. I'll leave test running overnight
just in case.
thanks again, and sorry for the noise.
PS: I wonder if we should a warning message to the build system if we're
building with known broken versions of GCC.
--
balbi
Hi,
On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> What GCC version are you using?
>
> 4.8.1 and 4.8.2 are known to miscompile the ARM kernel and these
> find_get_entry() crashes with 0xffffffff involved smell a lot like the
> earlier reports from kernels build with those compilers:
>
> https://lkml.org/lkml/2014/6/25/456
> https://lkml.org/lkml/2014/6/30/375
> https://lkml.org/lkml/2014/6/30/660
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
> https://lkml.org/lkml/2014/5/9/330
Is it possible to blacklist those GCC versions on ARM somehow as it
seems people are still using them?
This bug also ruined a file system on one of my boxes last year
(see e.g. http://marc.info/?l=linux-arm-kernel&m=139033442527244&w=2).
A.
On Thu, Oct 09, 2014 at 04:07:15PM -0500, Felipe Balbi wrote:
> Hi,
>
> On Thu, Oct 09, 2014 at 03:46:37PM -0500, Felipe Balbi wrote:
> > On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> > > On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > > > alright, it's pretty deterministic however. Always on the same test, no
> > > > matter which USB controller, no matter if backing store is RAM or MMC.
> > > >
> > > > Those two undefined instructions on the disassembly caught my attention,
> > > > perhaps I'm facing a GCC bug ?
> > >
> > > The undefined instructions are just ARM's BUG() implementation.
> > >
> > > But did you see the question I asked you yesterday in your other thread?
> > > http://www.spinics.net/lists/arm-kernel/msg368634.html
> >
> > hmm, completely missed that, sorry. I'm using 4.8.2, will try something
> > else.
>
> seems to be working fine now, thanks. I'll leave test running overnight
> just in case.
yup, ran over night without any problems.
--
balbi
On Fri, Oct 10, 2014 at 12:47:06AM +0300, Aaro Koskinen wrote:
> Hi,
>
> On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> > What GCC version are you using?
> >
> > 4.8.1 and 4.8.2 are known to miscompile the ARM kernel and these
> > find_get_entry() crashes with 0xffffffff involved smell a lot like the
> > earlier reports from kernels build with those compilers:
> >
> > https://lkml.org/lkml/2014/6/25/456
> > https://lkml.org/lkml/2014/6/30/375
> > https://lkml.org/lkml/2014/6/30/660
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
> > https://lkml.org/lkml/2014/5/9/330
>
> Is it possible to blacklist those GCC versions on ARM somehow as it
> seems people are still using them?
>
> This bug also ruined a file system on one of my boxes last year
> (see e.g. http://marc.info/?l=linux-arm-kernel&m=139033442527244&w=2).
Given that, why the fsck (pun intended) did you not shout a little louder
about getting it blacklisted. Looking at your marc.info URL, there's
very little information there which hints at filesystem corruption, and
it's a thread of only *one* message according to marc.info.
Even _if_ I did read the message you point to above, that on its own did
not hint at filesystem corruption.
So, would you please mind passing on further details about this,
specifically which function in the ext4 code is affected, so it can
be properly written up.
Thanks.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
On Fri, Oct 10, 2014 at 08:57:43AM -0500, Felipe Balbi wrote:
> On Thu, Oct 09, 2014 at 04:07:15PM -0500, Felipe Balbi wrote:
> > Hi,
> >
> > On Thu, Oct 09, 2014 at 03:46:37PM -0500, Felipe Balbi wrote:
> > > On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> > > > On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > > > > alright, it's pretty deterministic however. Always on the same test, no
> > > > > matter which USB controller, no matter if backing store is RAM or MMC.
> > > > >
> > > > > Those two undefined instructions on the disassembly caught my attention,
> > > > > perhaps I'm facing a GCC bug ?
> > > >
> > > > The undefined instructions are just ARM's BUG() implementation.
> > > >
> > > > But did you see the question I asked you yesterday in your other thread?
> > > > http://www.spinics.net/lists/arm-kernel/msg368634.html
> > >
> > > hmm, completely missed that, sorry. I'm using 4.8.2, will try something
> > > else.
> >
> > seems to be working fine now, thanks. I'll leave test running overnight
> > just in case.
>
> yup, ran over night without any problems.
Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
it seems that this has been known about for some time.)
We can blacklist these GCC versions quite easily. We already have GCC
3.3 blacklisted, and it's trivial to add others. I would want to include
some proper details about the bug, just like the other existing entries
we already have in asm-offsets.c, where we name the functions that the
compiler is known to break where appropriate.
However, I'm rather annoyed that there are people here who have known
for some time that GCC 4.8.1 and GCC 4.8.2 _can_ lead to filesystem
corruption, and have sat on their backsides doing nothing about getting
it blacklisted for something like a year.
When people talk about the ARM community being dysfunctional... well,
this kind of irresponsible behaviour just gives them more fodder to
throw at us.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
On Fri, Oct 10, 2014 at 05:18:35PM +0100, Russell King - ARM Linux wrote:
> On Fri, Oct 10, 2014 at 12:47:06AM +0300, Aaro Koskinen wrote:
> > On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> > > What GCC version are you using?
> > >
> > > 4.8.1 and 4.8.2 are known to miscompile the ARM kernel and these
> > > find_get_entry() crashes with 0xffffffff involved smell a lot like the
> > > earlier reports from kernels build with those compilers:
> > >
> > > https://lkml.org/lkml/2014/6/25/456
> > > https://lkml.org/lkml/2014/6/30/375
> > > https://lkml.org/lkml/2014/6/30/660
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
> > > https://lkml.org/lkml/2014/5/9/330
> >
> > Is it possible to blacklist those GCC versions on ARM somehow as it
> > seems people are still using them?
> >
> > This bug also ruined a file system on one of my boxes last year
> > (see e.g. http://marc.info/?l=linux-arm-kernel&m=139033442527244&w=2).
>
> Given that, why the fsck (pun intended) did you not shout a little louder
> about getting it blacklisted. Looking at your marc.info URL, there's
> very little information there which hints at filesystem corruption, and
> it's a thread of only *one* message according to marc.info.
>
> Even _if_ I did read the message you point to above, that on its own did
> not hint at filesystem corruption.
>
> So, would you please mind passing on further details about this,
> specifically which function in the ext4 code is affected, so it can
> be properly written up.
I have not done any proper deeper analysis. After I first mailed about
the issue I just downgraded GCC and pretty much forgot about it until
an engineer from some commercial Linux vendor replied privately months
later and kindly pointed me the needed GCC fix (which I then shared
in the reply). Then I just moved on using a newer GCC with no issues.
Obviously this was not a widespread problem since no one else
reported the same.
Today I again booted a kernel compiled with GCC 4.8.2 and still was able
reproduce the issue, and I think below shows that at least ext3 can
easily end up in inconsistent state using these compiler versions:
0) Run the bad kernel:
~ # dmesg|grep GCC
[ 0.000000] Linux version 3.17.0-mvebu-los_9755+ (aaro@cooljazz) (gcc version 4.8.2 (GCC) ) #1 Fri Oct 10 21:05:20 EEST 2014
1) Start with small ext3 (writeback) fs with gcc tarball:
/mnt/test # ls -l
total 84092
-rw-r--r-- 1 root root 85999682 Apr 24 21:52 gcc-4.8.2.tar.bz2
drwx------ 2 root root 16384 Oct 10 10:33 lost+found
/mnt/test # df -h .
Filesystem Size Used Available Use% Mounted on
/dev/sdc1 3.8G 90.2M 3.5G 2% /mnt/test
2) Extract, delete & crash:
/mnt/test # tar xjf gcc-4.8.2.tar.bz2
/mnt/test # rm -rf gcc-4.8.2
rm: can't remove 'gcc-4.8.2/libgfortran/generated': Directory not empty
rm: can't remove 'gcc-4.8.2/libgfortran': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg/compat/struct-by-value-18a_y.c': No such file or directory
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg/compat': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg/tree-ssa': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gfortran.dg/result_default_init_1.f90': No such file or directory
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gfortran.dg': Directory not empty
[ 960.864433] Unable to handle kernel paging request at virtual address ffffffff
[ 960.930597] pgd = df6e0000
[ 960.990849] [ffffffff] *pgd=1fffd831, *pte=00000000, *ppte=00000000
[ 961.056512] Internal error: Oops: 1 [#1] ARM
[ 961.120063] Modules linked in:
[ 961.180974] CPU: 0 PID: 684 Comm: rm Not tainted 3.17.0-mvebu-los_9755+ #1
[ 961.247146] task: df447b00 ti: df4de000 task.ti: df4de000
[ 961.311524] PC is at find_get_entry+0x28/0x84
[ 961.375037] LR is at radix_tree_lookup_slot+0x1c/0x2c
[ 961.439061] pc : [<c006e418>] lr : [<c018392c>] psr: a0000013
[ 961.439061] sp : df4dfc68 ip : 00000000 fp : df4dfc7c
[ 961.570018] r10: 00000001 r9 : c04e3253 r8 : df020b60
[ 961.634596] r7 : 0009001a r6 : 00000000 r5 : 0009001a r4 : df020c90
[ 961.700070] r3 : ffffffff r2 : 00000000 r1 : 0009001a r0 : ffffffff
[ 961.764437] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 961.830518] Control: 0005317f Table: 1f6e0000 DAC: 00000015
[ 961.895866] Process rm (pid: 684, stack limit = 0xdf4de1c0)
[ 961.960597] Stack: (0xdf4dfc68 to 0xdf4e0000)
[ 962.022968] fc60: 00000001 df020c8c df4dfcb4 df4dfc80 c006eef68 c006e400
[ 962.091214] fc80: c00d4e80 c00d4764 00001000 0009001a 00000000 00000000 df0200b60 df020b60
[ 962.159490] fca0: df020bd8 df04e4d8 df4dfd04 df4dfcb8 c00d34c0 c006ef44 000000000 df4dfcc8
[ 962.226940] fcc0: c00d4e80 c00d4764 00001000 00000001 df4dfd84 dd1c73f0 000900306 00000000
[ 962.295558] fce0: 00090068 00000000 00000000 df020b60 df04e4d8 00000181 df4dffd4c df4dfd08
[ 962.364710] fd00: c00d4828 c00d347c 00000000 00000001 df4dfdc4 dd1c73f0 000000000 00000000
[ 962.433394] fd20: 00000000 00000000 df4dfd84 00090002 00001000 dbaa2200 df0200b60 df04e4d8
[ 962.501810] fd40: df4dfdbc df4dfd50 c00d4e80 c00d4764 00001000 df4dfd60 c01411284 c0148708
[ 962.569685] fd60: 0009001a 00000000 c0ebc7c0 df041180 00000002 00000000 df4dffd9c df4dfd88
[ 962.639143] fd80: c003813c c0038084 df041180 df0b7320 df4dfdac 00090002 000000000 dbaa2200
[ 962.708562] fda0: df4dfe4c df04e4d8 00000181 df04e4d8 df4dfe24 df4dfdc0 c010887c0 c00d4e6c
[ 962.778108] fdc0: 00001000 c038caf8 0000128f 00000000 00000000 00011000 000000001 c9c59740
[ 962.846670] fde0: 0009001a 00000000 00000a26 c824f240 00000010 00000000 df4dffe1c df04e4d8
[ 962.913956] fe00: df04e4d8 df4dfe4c de53cf40 de53cf40 00000000 df04e4d8 df4dffe44 df4dfe28
[ 962.980679] fe20: c010c5a8 c01086c4 df04e4d8 dee12000 dbaa2200 df04e4b4 df4dffe84 df4dfe48
[ 963.046696] fe40: c0115dc4 c010c584 dd1c73f0 00000000 00000100 00000012 000000000 c0fbfe00
[ 963.112648] fe60: df04e4d8 dd1c73f0 de53cf40 00000000 df4dff04 df04e4d8 df4dffecc df4dfe88
[ 963.178402] fe80: c0116b24 c0115ce0 00000000 c00b3b24 df4dfeac c067b174 5437dd0a4 22921900
[ 963.244947] fea0: df4dfecc df4dfeb0 c00b7a50 c19ca440 df04e4d8 df04e534 dd1c773f0 000b6650
[ 963.311517] fec0: df4dfefc df4dfed0 c00b7e4c c01168d8 df4dfefc df4dfee0 c19caa440 00000000
[ 963.377319] fee0: df4e6000 00000000 000b6650 ffffff9c df4dff94 df4dff00 c00b880b0 c00b7d94
[ 963.443083] ff00: 5437d035 00000000 dba4a8d0 d899f6e8 78ae7ba4 0000000d df4e6603c 0000000c
[ 963.509416] ff20: 00000000 c0009624 dd1c73f0 00000000 00000004 00000038 000000000 00000000
[ 963.575556] ff40: 00024182 00000000 00800021 c04c81b4 00000001 000003e8 0000003e8 00000000
[ 963.641281] ff60: 0000024d 00000000 4bfad53f 000b6650 00000008 0000000c 00000000a c0009624
[ 963.707194] ff80: df4de000 00000000 df4dffa4 df4dff98 c00b8e20 c00b7ed0 000000000 df4dffa8
[ 963.773584] ffa0: c00094c0 c00b8e18 000b6650 00000008 000b6650 bed03990 bed033990 00008000
[ 963.841022] ffc0: 000b6650 00000008 0000000c 0000000a 000b6650 00000000 b6fccc000 00000000
[ 963.907530] ffe0: 00093224 bed0398c 00071284 b6efa39c 60000010 000b6650 0000fffff 0000ffff
[ 963.973653] Backtrace: [ 964.032680] [<c006e3f0>] (find_get_entry) from [<c006ef68>] (pagecache_get_page+0x34/0x1fc)
[ 964.100751] r5:df020c8c r4:00000001
[ 964.162591] [<c006ef34>] (pagecache_get_page) from [<c00d34c0>] (__find_get_b
block_slow+0x54/0x16c)
[ 964.291505] r10:df04e4d8 r9:df020bd8 r8:df020b60 r7:df020b60 r6:00000000 r5:
:00000000
[ 964.361857] r4:0009001a
[ 964.425342] [<c00d346c>] (__find_get_block_slow) from [<c00d4828>] (__find_ge
et_block+0xd4/0x1e4)
[ 964.498345] r9:00000181 r8:df04e4d8 r7:df020b60 r6:00000000 r5:00000000 r4:0
00090068
[ 964.570979] [<c00d4754>] (__find_get_block) from [<c00d4e80>] (__getblk+0x24/
/0x358)
[ 964.643833] r8:df04e4d8 r7:df020b60 r6:dbaa2200 r5:00001000 r4:00090002
[ 964.716031] [<c00d4e5c>] (__getblk) from [<c01087c0>] (__ext4_get_inode_loc+0
0x10c/0x454)
[ 964.790734] r10:df04e4d8 r9:00000181 r8:df04e4d8 r7:df4dfe4c r6:dbaa2200 r5:
:00000000
[ 964.865945] r4:00090002
[ 964.934187] [<c01086b4>] (__ext4_get_inode_loc) from [<c010c5a8>] (ext4_reser
rve_inode_write+0x34/0x9c)
[ 965.080216] r10:df04e4d8 r9:00000000 r8:de53cf40 r7:de53cf40 r6:df4dfe4c r5:
:df04e4d8
[ 965.159656] r4:df04e4d8
[ 965.232230] [<c010c574>] (ext4_reserve_inode_write) from [<c0115dc4>] (ext4_o
orphan_add+0xf4/0x218)
[ 965.385687] r7:df04e4b4 r6:dbaa2200 r5:dee12000 r4:df04e4d8
[ 965.464523] [<c0115cd0>] (ext4_orphan_add) from [<c0116b24>] (ext4_unlink+0x2
25c/0x26c)
[ 965.547430] r10:df04e4d8 r9:df4dff04 r8:00000000 r7:de53cf40 r6:dd1c73f0 r5:
:df04e4d8
[ 965.631429] r4:c0fbfe00
[ 965.708445] [<c01168c8>] (ext4_unlink) from [<c00b7e4c>] (vfs_unlink+0xc8/0x1
13c)
[ 965.792677] r8:000b6650 r7:dd1c73f0 r6:df04e534 r5:df04e4d8 r4:c19ca440
[ 965.877297] [<c00b7d84>] (vfs_unlink) from [<c00b80b0>] (do_unlinkat+0x1f0/0x
x210)
[ 965.963851] r9:ffffff9c r8:000b6650 r7:00000000 r6:df4e6000 r5:00000000 r4:c
c19ca440
[ 966.051666] [<c00b7ec0>] (do_unlinkat) from [<c00b8e20>] (SyS_unlink+0x18/0x1
1c)
[ 966.139262] r10:00000000 r9:df4de000 r8:c0009624 r7:0000000a r6:0000000c r5:
:00000008
[ 966.228970] r4:000b6650
[ 966.311776] [<c00b8e08>] (SyS_unlink) from [<c00094c0>] (ret_fast_syscall+0x0
0/0x2c)
[ 966.401452] Code: e1a01005 eb04553f e2503000 0a00000f (e5930000)
[ 966.608250] ---[ end trace a1b54af48fda09ed ]---
[ 966.693854] Kernel panic - not syncing: Fatal exception
[ 966.781707] ---[ end Kernel panic - not syncing: Fatal exception
3) Boot a good kernel:
~ # dmesg | grep GCC
[ 0.000000] Linux version 3.17.0-mvebu-los_1b42 (aaro@cooljazz) (gcc version 4.9.1 (GCC) ) #1 Thu Oct 9 06:46:07 EEST 2014
4) Use the beforementioned file system and try to clean the mess:
/mnt/test # df -h .
Filesystem Size Used Available Use% Mounted on
/dev/sdc1 3.8G 796.2M 2.8G 22% /mnt/test
/mnt/test # rm -rf gcc-4.8.2
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg/tree-ssa': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gfortran.dg': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc': Directory not empty
rm: can't remove 'gcc-4.8.2': Directory not empty
/mnt/test # rm -rf gcc-4.8.2
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg/tree-ssa': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gfortran.dg': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc': Directory not empty
rm: can't remove 'gcc-4.8.2': Directory not empty
/mnt/test # df -h .
Filesystem Size Used Available Use% Mounted on
/dev/sdc1 3.8G 90.5M 3.5G 2% /mnt/test
/mnt/test # find gcc-4.8.2
gcc-4.8.2
gcc-4.8.2/gcc
gcc-4.8.2/gcc/testsuite
gcc-4.8.2/gcc/testsuite/gcc.dg
gcc-4.8.2/gcc/testsuite/gcc.dg/tree-ssa
find: gcc-4.8.2/gcc/testsuite/gcc.dg/tree-ssa/forwprop-8.c: No such file or directory
gcc-4.8.2/gcc/testsuite/gfortran.dg
find: gcc-4.8.2/gcc/testsuite/gfortran.dg/result_default_init_1.f90: No such file or directory
5) fsck to rescue:
/mnt/test # cd /
~ # umount /mnt/test
~ # fsck /dev/sdc1
fsck 1.42.9 (28-Dec-2013)
e2fsck 1.42.9 (28-Dec-2013)
/dev/sdc1: clean, 21/262144 files, 72408/1048576 blocks
~ # fsck -f /dev/sdc1
fsck 1.42.9 (28-Dec-2013)
e2fsck 1.42.9 (28-Dec-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Problem in HTREE directory inode 118267: block #4 has bad min hash
Problem in HTREE directory inode 118267: block #26 has bad max hash
Invalid HTREE directory inode 118267 (/gcc-4.8.2/gcc/testsuite/gfortran.dg). Clear HTree index<y>? yes
Problem in HTREE directory inode 174218: block #8 has bad min hash
Invalid HTREE directory inode 174218 (/gcc-4.8.2/gcc/testsuite/gcc.dg/tree-ssa). Clear HTree index<y>? yes
Pass 3: Checking directory connectivity
Pass 3A: Optimizing directories
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sdc1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdc1: 21/262144 files (19.0% non-contiguous), 72368/1048576 blocks
~ # mount /dev/sdc1 /mnt/
~ # rm -rf /mnt/gcc-4.8.2
~ #
So in this case fsck was able to fix it.
A.
On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>
> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> it seems that this has been known about for some time.)
Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
are affected, as well as 4.9.0.
> We can blacklist these GCC versions quite easily. We already have GCC
> 3.3 blacklisted, and it's trivial to add others. I would want to include
> some proper details about the bug, just like the other existing entries
> we already have in asm-offsets.c, where we name the functions that the
> compiler is known to break where appropriate.
Before blacklisting anything, it's worth considering that simple version
checks would break existing pre-4.8.3 compilers that have been patched
for PR58854. It looks like Yocto and Buildroot issued releases with
patched 4.8.2 compilers well before the (fixed) 4.8.3 release. I think
the most we can reasonably do without breaking some correctly-behaving
toolchains is to emit a warning.
Hopefully nobody's still using gcc 4.8 from the Linaro 2013.11 toolchain
release -- since it's a 4.8.3 prerelease from before the fix was
committed you'll get GCC_VERSION == 40803 but still generate bad code.
> However, I'm rather annoyed that there are people here who have known
> for some time that GCC 4.8.1 and GCC 4.8.2 _can_ lead to filesystem
> corruption, and have sat on their backsides doing nothing about getting
> it blacklisted for something like a year.
Mea culpa, although I hadn't drawn the connection to FS corruption
reports until now. I have known about the issue for some time, but
figured the prevalence of the fix in downstream projects largely
mitigated the issue.
On 10/10/2014 09:44 PM, Nathan Lynch wrote:
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>>
>> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
>> it seems that this has been known about for some time.)
>
> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> are affected, as well as 4.9.0.
>
>> We can blacklist these GCC versions quite easily. We already have GCC
>> 3.3 blacklisted, and it's trivial to add others. I would want to include
>> some proper details about the bug, just like the other existing entries
>> we already have in asm-offsets.c, where we name the functions that the
>> compiler is known to break where appropriate.
>
> Before blacklisting anything, it's worth considering that simple version
> checks would break existing pre-4.8.3 compilers that have been patched
> for PR58854. It looks like Yocto and Buildroot issued releases with
> patched 4.8.2 compilers well before the (fixed) 4.8.3 release. I think
> the most we can reasonably do without breaking some correctly-behaving
> toolchains is to emit a warning.
Providing a manual switch to override blacklisting is way more sane
than a build warning that no one's looking at.
On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> >
> > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > it seems that this has been known about for some time.)
>
> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> are affected, as well as 4.9.0.
>
> > We can blacklist these GCC versions quite easily. We already have GCC
> > 3.3 blacklisted, and it's trivial to add others. I would want to include
> > some proper details about the bug, just like the other existing entries
> > we already have in asm-offsets.c, where we name the functions that the
> > compiler is known to break where appropriate.
>
> Before blacklisting anything, it's worth considering that simple version
> checks would break existing pre-4.8.3 compilers that have been patched
> for PR58854. It looks like Yocto and Buildroot issued releases with
> patched 4.8.2 compilers well before the (fixed) 4.8.3 release. I think
> the most we can reasonably do without breaking some correctly-behaving
> toolchains is to emit a warning.
Yocto has PR58854 problem patch.
http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-devtools/gcc/gcc-4.8/0048-PR58854_fix_arm_apcs_epilogue.patch?h=daisy
>
> Hopefully nobody's still using gcc 4.8 from the Linaro 2013.11 toolchain
> release -- since it's a 4.8.3 prerelease from before the fix was
> committed you'll get GCC_VERSION == 40803 but still generate bad code.
>
> > However, I'm rather annoyed that there are people here who have known
> > for some time that GCC 4.8.1 and GCC 4.8.2 _can_ lead to filesystem
> > corruption, and have sat on their backsides doing nothing about getting
> > it blacklisted for something like a year.
>
> Mea culpa, although I hadn't drawn the connection to FS corruption
> reports until now. I have known about the issue for some time, but
> figured the prevalence of the fix in downstream projects largely
> mitigated the issue.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Best Regards,
Peter Chen
On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > We can blacklist these GCC versions quite easily. We already have GCC
> > 3.3 blacklisted, and it's trivial to add others. I would want to include
> > some proper details about the bug, just like the other existing entries
> > we already have in asm-offsets.c, where we name the functions that the
> > compiler is known to break where appropriate.
>
> Before blacklisting anything, it's worth considering that simple version
> checks would break existing pre-4.8.3 compilers that have been patched
> for PR58854. It looks like Yocto and Buildroot issued releases with
> patched 4.8.2 compilers well before the (fixed) 4.8.3 release. I think
> the most we can reasonably do without breaking some correctly-behaving
> toolchains is to emit a warning.
I wish that it was possible to just do the warning thing, but unfortunately
evidence is that many people ignore compiler warnings, because they see
them appearing from the kernel soo often they have become de-sensitised
to them.
This is pretty obvious from the various nightly build systems which produce
the same warnings for months without any progress on them - some of them
can be quite serious (oops-able) where printf format strings are concerned.
> > for some time that GCC 4.8.1 and GCC 4.8.2 _can_ lead to filesystem
> > corruption, and have sat on their backsides doing nothing about getting
> > it blacklisted for something like a year.
>
> Mea culpa, although I hadn't drawn the connection to FS corruption
> reports until now. I have known about the issue for some time, but
> figured the prevalence of the fix in downstream projects largely
> mitigated the issue.
It's the FS corruption which swings it in favour of a #error - even if
we have a bunch of compilers around with that version which have the
problem fixed, it's /far/ better to #error out. Those people who know
definitely that they have a fixed compiler can comment out the test
after checking that they do indeed have a fixed version, or are willing
to take the risk.
What we can't do is have kernels built by people who then run into FS
corruption because of this known issue.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
On Sat, Oct 11, 2014 at 11:54:32AM +0800, Peter Chen wrote:
> On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
> > On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > >
> > > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > > it seems that this has been known about for some time.)
> >
> > Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> > are affected, as well as 4.9.0.
> >
> > > We can blacklist these GCC versions quite easily. We already have GCC
> > > 3.3 blacklisted, and it's trivial to add others. I would want to include
> > > some proper details about the bug, just like the other existing entries
> > > we already have in asm-offsets.c, where we name the functions that the
> > > compiler is known to break where appropriate.
> >
> > Before blacklisting anything, it's worth considering that simple version
> > checks would break existing pre-4.8.3 compilers that have been patched
> > for PR58854. It looks like Yocto and Buildroot issued releases with
> > patched 4.8.2 compilers well before the (fixed) 4.8.3 release. I think
> > the most we can reasonably do without breaking some correctly-behaving
> > toolchains is to emit a warning.
>
> Yocto has PR58854 problem patch.
>
> http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-devtools/gcc/gcc-4.8/0048-PR58854_fix_arm_apcs_epilogue.patch?h=daisy
Right, and we can provide links to these in the comments above the #error
so people have the right places to do a bit of research into whether their
compiler is safe.
It is unfortunate that they are indistinguishable from the broken versions,
but that's really a distro problem for causing that issue themselves -
especially given how serious this bug is.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
Hello Russell,
On Sat, Oct 11, 2014 at 11:16 AM, Russell King - ARM Linux
<[email protected]> wrote:
> On Sat, Oct 11, 2014 at 11:54:32AM +0800, Peter Chen wrote:
>> On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
>> > On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>> > >
>> > > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
>> > > it seems that this has been known about for some time.)
>> >
>> > Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
>> > are affected, as well as 4.9.0.
>> >
>> > > We can blacklist these GCC versions quite easily. We already have GCC
>> > > 3.3 blacklisted, and it's trivial to add others. I would want to include
>> > > some proper details about the bug, just like the other existing entries
>> > > we already have in asm-offsets.c, where we name the functions that the
>> > > compiler is known to break where appropriate.
>> >
>> > Before blacklisting anything, it's worth considering that simple version
>> > checks would break existing pre-4.8.3 compilers that have been patched
>> > for PR58854. It looks like Yocto and Buildroot issued releases with
>> > patched 4.8.2 compilers well before the (fixed) 4.8.3 release. I think
>> > the most we can reasonably do without breaking some correctly-behaving
>> > toolchains is to emit a warning.
>>
>> Yocto has PR58854 problem patch.
>>
>> http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-devtools/gcc/gcc-4.8/0048-PR58854_fix_arm_apcs_epilogue.patch?h=daisy
>
> Right, and we can provide links to these in the comments above the #error
> so people have the right places to do a bit of research into whether their
> compiler is safe.
>
> It is unfortunate that they are indistinguishable from the broken versions,
> but that's really a distro problem for causing that issue themselves -
> especially given how serious this bug is.
What about checking if GCC_PR58854_FIXED is not defined for error? So
build systems and people could easily define it if they know their GCC
has the fix applied.
--
Otavio Salvador O.S. Systems
http://www.ossystems.com.br http://code.ossystems.com.br
Mobile: +55 (53) 9981-7854 Mobile: +1 (347) 903-9750
On 10/11/2014 10:51 AM, Otavio Salvador wrote:
> Hello Russell,
>
> On Sat, Oct 11, 2014 at 11:16 AM, Russell King - ARM Linux
> <[email protected]> wrote:
>> On Sat, Oct 11, 2014 at 11:54:32AM +0800, Peter Chen wrote:
>>> On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
>>>> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>>>>>
>>>>> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
>>>>> it seems that this has been known about for some time.)
>>>>
>>>> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
>>>> are affected, as well as 4.9.0.
>>>>
>>>>> We can blacklist these GCC versions quite easily. We already have GCC
>>>>> 3.3 blacklisted, and it's trivial to add others. I would want to include
>>>>> some proper details about the bug, just like the other existing entries
>>>>> we already have in asm-offsets.c, where we name the functions that the
>>>>> compiler is known to break where appropriate.
>>>>
>>>> Before blacklisting anything, it's worth considering that simple version
>>>> checks would break existing pre-4.8.3 compilers that have been patched
>>>> for PR58854. It looks like Yocto and Buildroot issued releases with
>>>> patched 4.8.2 compilers well before the (fixed) 4.8.3 release. I think
>>>> the most we can reasonably do without breaking some correctly-behaving
>>>> toolchains is to emit a warning.
>>>
>>> Yocto has PR58854 problem patch.
>>>
>>> http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-devtools/gcc/gcc-4.8/0048-PR58854_fix_arm_apcs_epilogue.patch?h=daisy
>>
>> Right, and we can provide links to these in the comments above the #error
>> so people have the right places to do a bit of research into whether their
>> compiler is safe.
>>
>> It is unfortunate that they are indistinguishable from the broken versions,
>> but that's really a distro problem for causing that issue themselves -
>> especially given how serious this bug is.
>
> What about checking if GCC_PR58854_FIXED is not defined for error? So
> build systems and people could easily define it if they know their GCC
> has the fix applied.
If the distro/build system/individual is capable of patching gcc, then it
seems reasonable that the same distro/build system/individual is capable
of carrying a patch on top of mainline kernel for building with their
"special" compiler.
On 10/10/2014 08:44 PM, Nathan Lynch wrote:
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>>
>> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
>> it seems that this has been known about for some time.)
>
> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> are affected, as well as 4.9.0.
Correction -- 4.9.0 has this fixed, even though the GCC PR shows it as a
"known to fail" version.
From: Nathan Lynch
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> >
> > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > it seems that this has been known about for some time.)
>
> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> are affected, as well as 4.9.0.
>
> > We can blacklist these GCC versions quite easily. We already have GCC
> > 3.3 blacklisted, and it's trivial to add others. I would want to include
> > some proper details about the bug, just like the other existing entries
> > we already have in asm-offsets.c, where we name the functions that the
> > compiler is known to break where appropriate.
>
> Before blacklisting anything, it's worth considering that simple version
> checks would break existing pre-4.8.3 compilers that have been patched
> for PR58854. It looks like Yocto and Buildroot issued releases with
> patched 4.8.2 compilers well before the (fixed) 4.8.3 release. I think
> the most we can reasonably do without breaking some correctly-behaving
> toolchains is to emit a warning.
Is it possible to compile a small code fragment and check the generated
code for the bug?
Possibly predicated on the broken version number to avoid false positives.
David
On Mon, Oct 13, 2014 at 09:11:34AM +0000, David Laight wrote:
> From: Nathan Lynch
> > On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > >
> > > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > > it seems that this has been known about for some time.)
> >
> > Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> > are affected, as well as 4.9.0.
> >
> > > We can blacklist these GCC versions quite easily. We already have GCC
> > > 3.3 blacklisted, and it's trivial to add others. I would want to include
> > > some proper details about the bug, just like the other existing entries
> > > we already have in asm-offsets.c, where we name the functions that the
> > > compiler is known to break where appropriate.
> >
> > Before blacklisting anything, it's worth considering that simple version
> > checks would break existing pre-4.8.3 compilers that have been patched
> > for PR58854. It looks like Yocto and Buildroot issued releases with
> > patched 4.8.2 compilers well before the (fixed) 4.8.3 release. I think
> > the most we can reasonably do without breaking some correctly-behaving
> > toolchains is to emit a warning.
>
> Is it possible to compile a small code fragment and check the generated
> code for the bug?
> Possibly predicated on the broken version number to avoid false positives.
I don't see how - it looks like it requires an interrupt to occur at an
opportune moment to provoke the function to fail. The alternative would
be to parse the assembly generated by the compiler to determine how it
is dealing with the stack.
I think the only viable solution here is that:
1. We blacklist the bad compiler versions outright in the kernel.
2. We /consider/ a testing a preprocessor symbol which when present
indicates that these versions are fixed and should not be blacklisted.
The argument for (2) is that /if/ distros want to patch their compilers
to fix the problem, they /also/ have the ability to patch their compilers
to make them identifyable, and that is a far more reliable solution than
trying to parse the assembly output from multiple different GCC versions.
Remember, it's the distro's choice to fix these buggy compilers, so the
onus is on _them_ to deal with the mess they've created by doing so.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
On Mon, Oct 13, 2014 at 12:43:07PM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 13, 2014 at 09:11:34AM +0000, David Laight wrote:
> > From: Nathan Lynch
> > > On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > > >
> > > > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > > > it seems that this has been known about for some time.)
> > >
> > > Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> > > are affected, as well as 4.9.0.
> > >
> > > > We can blacklist these GCC versions quite easily. We already have GCC
> > > > 3.3 blacklisted, and it's trivial to add others. I would want to include
> > > > some proper details about the bug, just like the other existing entries
> > > > we already have in asm-offsets.c, where we name the functions that the
> > > > compiler is known to break where appropriate.
> > >
> > > Before blacklisting anything, it's worth considering that simple version
> > > checks would break existing pre-4.8.3 compilers that have been patched
> > > for PR58854. It looks like Yocto and Buildroot issued releases with
> > > patched 4.8.2 compilers well before the (fixed) 4.8.3 release. I think
> > > the most we can reasonably do without breaking some correctly-behaving
> > > toolchains is to emit a warning.
> >
> > Is it possible to compile a small code fragment and check the generated
> > code for the bug?
> > Possibly predicated on the broken version number to avoid false positives.
>
> I don't see how - it looks like it requires an interrupt to occur at an
> opportune moment to provoke the function to fail. The alternative would
> be to parse the assembly generated by the compiler to determine how it
> is dealing with the stack.
>
> I think the only viable solution here is that:
>
> 1. We blacklist the bad compiler versions outright in the kernel.
Yes, please do this, it's what we have done for other buggy compiler
versions, no need to do something different here.
> Remember, it's the distro's choice to fix these buggy compilers, so the
> onus is on _them_ to deal with the mess they've created by doing so.
I totally agree.
Is someone going to send this patch, or do I have to write it myself?
thanks,
greg k-h
On 10/13/2014 10:06 PM, Greg KH wrote:
> On Mon, Oct 13, 2014 at 12:43:07PM +0100, Russell King - ARM Linux wrote:
>> On Mon, Oct 13, 2014 at 09:11:34AM +0000, David Laight wrote:
>>> From: Nathan Lynch
>>>> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>>>>>
>>>>> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
>>>>> it seems that this has been known about for some time.)
>>>>
>>>> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
>>>> are affected, as well as 4.9.0.
>>>>
>>>>> We can blacklist these GCC versions quite easily. We already have GCC
>>>>> 3.3 blacklisted, and it's trivial to add others. I would want to include
>>>>> some proper details about the bug, just like the other existing entries
>>>>> we already have in asm-offsets.c, where we name the functions that the
>>>>> compiler is known to break where appropriate.
>>>>
>>>> Before blacklisting anything, it's worth considering that simple version
>>>> checks would break existing pre-4.8.3 compilers that have been patched
>>>> for PR58854. It looks like Yocto and Buildroot issued releases with
>>>> patched 4.8.2 compilers well before the (fixed) 4.8.3 release. I think
>>>> the most we can reasonably do without breaking some correctly-behaving
>>>> toolchains is to emit a warning.
>>>
>>> Is it possible to compile a small code fragment and check the generated
>>> code for the bug?
>>> Possibly predicated on the broken version number to avoid false positives.
>>
>> I don't see how - it looks like it requires an interrupt to occur at an
>> opportune moment to provoke the function to fail. The alternative would
>> be to parse the assembly generated by the compiler to determine how it
>> is dealing with the stack.
>>
>> I think the only viable solution here is that:
>>
>> 1. We blacklist the bad compiler versions outright in the kernel.
>
> Yes, please do this, it's what we have done for other buggy compiler
> versions, no need to do something different here.
>
>> Remember, it's the distro's choice to fix these buggy compilers, so the
>> onus is on _them_ to deal with the mess they've created by doing so.
>
> I totally agree.
>
> Is someone going to send this patch, or do I have to write it myself?
I did on Friday (arm: Blacklist gcc 4.8.[012] ...) but Russell said he
was doing it himself.
Regards,
Peter Hurley
On Tue, Oct 14, 2014 at 04:06:40AM +0200, Greg KH wrote:
> On Mon, Oct 13, 2014 at 12:43:07PM +0100, Russell King - ARM Linux wrote:
> > I think the only viable solution here is that:
> >
> > 1. We blacklist the bad compiler versions outright in the kernel.
>
> Yes, please do this, it's what we have done for other buggy compiler
> versions, no need to do something different here.
>
> > Remember, it's the distro's choice to fix these buggy compilers, so the
> > onus is on _them_ to deal with the mess they've created by doing so.
>
> I totally agree.
>
> Is someone going to send this patch, or do I have to write it myself?
As I said, I have a patch in progress, but it seems that there needed
to be some discussion about exactly which compiler versions are affected.
It seems that it's not as trivial as looking at the GCC bug entry.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
On Wed, Oct 15, 2014 at 10:23:10PM +0100, Russell King - ARM Linux wrote:
> As I said, I have a patch in progress, but it seems that there needed
> to be some discussion about exactly which compiler versions are affected.
> It seems that it's not as trivial as looking at the GCC bug entry.
... and in any case, it has been a known bug for well over a year now,
and it seems that it doesn't affect _that_ many people. So taking some
extra time to get it properly correct is the _right_ thing to do.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
On Wed, Oct 15, 2014 at 10:25:13PM +0100, Russell King - ARM Linux wrote:
> On Wed, Oct 15, 2014 at 10:23:10PM +0100, Russell King - ARM Linux wrote:
> > As I said, I have a patch in progress, but it seems that there needed
> > to be some discussion about exactly which compiler versions are affected.
> > It seems that it's not as trivial as looking at the GCC bug entry.
>
> ... and in any case, it has been a known bug for well over a year now,
> and it seems that it doesn't affect _that_ many people. So taking some
> extra time to get it properly correct is the _right_ thing to do.
Well, this is just great. Pushing out the change which blacklists these
compilers takes out Olof's kernel build system...
Things are not as trivial as they seem.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
Hi,
On Sun, Oct 19, 2014 at 10:54:16AM +0100, Russell King - ARM Linux wrote:
> On Wed, Oct 15, 2014 at 10:25:13PM +0100, Russell King - ARM Linux wrote:
> > On Wed, Oct 15, 2014 at 10:23:10PM +0100, Russell King - ARM Linux wrote:
> > > As I said, I have a patch in progress, but it seems that there needed
> > > to be some discussion about exactly which compiler versions are affected.
> > > It seems that it's not as trivial as looking at the GCC bug entry.
> >
> > ... and in any case, it has been a known bug for well over a year now,
> > and it seems that it doesn't affect _that_ many people. So taking some
> > extra time to get it properly correct is the _right_ thing to do.
>
> Well, this is just great. Pushing out the change which blacklists these
> compilers takes out Olof's kernel build system...
>
> Things are not as trivial as they seem.
Maybe Olof just needs to update his compiler. Olof ?
--
balbi
On Sun, Oct 19, 2014 at 8:28 AM, Felipe Balbi <[email protected]> wrote:
> Hi,
>
> On Sun, Oct 19, 2014 at 10:54:16AM +0100, Russell King - ARM Linux wrote:
>> On Wed, Oct 15, 2014 at 10:25:13PM +0100, Russell King - ARM Linux wrote:
>> > On Wed, Oct 15, 2014 at 10:23:10PM +0100, Russell King - ARM Linux wrote:
>> > > As I said, I have a patch in progress, but it seems that there needed
>> > > to be some discussion about exactly which compiler versions are affected.
>> > > It seems that it's not as trivial as looking at the GCC bug entry.
>> >
>> > ... and in any case, it has been a known bug for well over a year now,
>> > and it seems that it doesn't affect _that_ many people. So taking some
>> > extra time to get it properly correct is the _right_ thing to do.
>>
>> Well, this is just great. Pushing out the change which blacklists these
>> compilers takes out Olof's kernel build system...
>>
>> Things are not as trivial as they seem.
>
> Maybe Olof just needs to update his compiler. Olof ?
Yep, doing a run with 4.9.1 to see how it looks. In the past, 4.9 has
been really noisy with warnings, maybe most of them have been fixed by
now.
-Olof