2022-02-09 06:43:22

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v5 net-next 0/3] use bulk reads for ocelot statistics

On Tue, Feb 08, 2022 at 05:55:32AM -0800, Colin Foster wrote:
> On Tue, Feb 08, 2022 at 01:30:16PM +0000, Vladimir Oltean wrote:
> > On Mon, Feb 07, 2022 at 08:46:41PM -0800, Colin Foster wrote:
> > > Ocelot loops over memory regions to gather stats on different ports.
> > > These regions are mostly continuous, and are ordered. This patch set
> > > uses that information to break the stats reads into regions that can get
> > > read in bulk.
> > >
> > > The motiviation is for general cleanup, but also for SPI. Performing two
> > > back-to-back reads on a SPI bus require toggling the CS line, holding,
> > > re-toggling the CS line, sending 3 address bytes, sending N padding
> > > bytes, then actually performing the read. Bulk reads could reduce almost
> > > all of that overhead, but require that the reads are performed via
> > > regmap_bulk_read.
> > >
> > > v1 > v2: reword commit messages
> > > v2 > v3: correctly mark this for net-next when sending
> > > v3 > v4: calloc array instead of zalloc per review
> > > v4 > v5:
> > > Apply CR suggestions for whitespace
> > > Fix calloc / zalloc mixup
> > > Properly destroy workqueues
> > > Add third commit to split long macros
> > >
> > >
> > > Colin Foster (3):
> > > net: ocelot: align macros for consistency
> > > net: mscc: ocelot: add ability to perform bulk reads
> > > net: mscc: ocelot: use bulk reads for stats
> > >
> > > drivers/net/ethernet/mscc/ocelot.c | 78 ++++++++++++++++++++++-----
> > > drivers/net/ethernet/mscc/ocelot_io.c | 13 +++++
> > > include/soc/mscc/ocelot.h | 57 ++++++++++++++------
> > > 3 files changed, 120 insertions(+), 28 deletions(-)
> > >
> > > --
> > > 2.25.1
> > >
> >
> > Please do not merge these yet. I gave them a run on my board and the
> > kernel crashed on boot.
> >
> > [ 8.043170] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 0
> > [ 8.050241] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 1
> > [ 8.057142] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 2
> > [ 8.064021] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 3
> > [ 8.128668] ------------[ cut here ]------------
> > [ 8.133315] WARNING: CPU: 1 PID: 44 at drivers/net/dsa/ocelot/felix_vsc9959.c:1007 vsc9959_wm_enc+0x2c/0x40
> > [ 8.143107] Modules linked in:
> > [ 8.146181] CPU: 1 PID: 44 Comm: kworker/u4:2 Not tainted 5.17.0-rc2-07010-ga9b9500ffaac-dirty #2104
> > [ 8.155355] Hardware name: LS1028A RDB Board (DT)
> > [ 8.160079] Workqueue: events_unbound deferred_probe_work_func
> > [ 8.165945] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 8.172940] pc : vsc9959_wm_enc+0x2c/0x40
> > [ 8.176967] lr : ocelot_setup_sharing_watermarks+0x94/0x1fc
> > [ 8.182568] sp : ffff800008863810
> > [ 8.185896] x29: ffff800008863810 x28: 0000000000000308 x27: 0000000000000001
> > [ 8.193079] x26: 000000000300001a x25: 0000000000000008 x24: 0000000000000080
> > [ 8.200261] x23: 0000000088888889 x22: 0000000000000000 x21: 0000000000000000
> > [ 8.207441] x20: 00000000ffff2d17 x19: ffff2d17039d8010 x18: ffffd8e2afa23b28
> > [ 8.214623] x17: 0000000000000015 x16: 0000000000000041 x15: 0000000000000000
> > [ 8.221803] x14: ffffd8e2afa49228 x13: 0000000000025700 x12: 0000000000000009
> > [ 8.228984] x11: ffff2d1703a96c18 x10: 0000000000000004 x9 : ffffd8e2ad6ce0f8
> > [ 8.236165] x8 : ffff2d1700be4240 x7 : ffffd8e2af981000 x6 : 0000000000000001
> > [ 8.243345] x5 : ffffd8e2ad1e1440 x4 : 0000000000000000 x3 : 0000000000000000
> > [ 8.250525] x2 : 0000000000000000 x1 : ffffd8e2ad3d2810 x0 : 00000000000040c0
> > [ 8.257706] Call trace:
> > [ 8.260162] vsc9959_wm_enc+0x2c/0x40
> > [ 8.263841] ocelot_devlink_sb_register+0x33c/0x380
> > [ 8.268742] felix_setup+0x438/0x750
> > [ 8.272334] dsa_register_switch+0x988/0x114c
> > [ 8.276713] felix_pci_probe+0x138/0x1fc
> > [ 8.280654] local_pci_probe+0x4c/0xc0
> > [ 8.284423] pci_device_probe+0x1b0/0x1f0
> > [ 8.288451] really_probe.part.0+0xa4/0x310
> > [ 8.292654] __driver_probe_device+0xa0/0x150
> > [ 8.297030] driver_probe_device+0xcc/0x164
> > [ 8.301231] __device_attach_driver+0xc4/0x130
> > [ 8.305695] bus_for_each_drv+0x84/0xe0
> > [ 8.309547] __device_attach+0xe4/0x190
> > [ 8.313400] device_initial_probe+0x20/0x30
> > [ 8.317601] bus_probe_device+0xac/0xb4
> > [ 8.321454] deferred_probe_work_func+0x98/0xd4
> > [ 8.326004] process_one_work+0x294/0x700
> > [ 8.330037] worker_thread+0x80/0x480
> > [ 8.333717] kthread+0x10c/0x120
> > [ 8.336961] ret_from_fork+0x10/0x20
> > [ 8.340554] irq event stamp: 50432
> > [ 8.343968] hardirqs last enabled at (50431): [<ffffd8e2ade442b0>] _raw_spin_unlock_irqrestore+0x90/0xb0
> > [ 8.353581] hardirqs last disabled at (50432): [<ffffd8e2ade36e44>] el1_dbg+0x24/0x90
> > [ 8.361448] softirqs last enabled at (50148): [<ffffd8e2ac6908f0>] __do_softirq+0x480/0x5f8
> > [ 8.369923] softirqs last disabled at (50143): [<ffffd8e2ac728e3c>] __irq_exit_rcu+0x17c/0x1b0
> > [ 8.378577] ---[ end trace 0000000000000000 ]---
> > [ 8.383304] ------------[ cut here ]------------
> > [ 8.387942] WARNING: CPU: 1 PID: 44 at drivers/net/dsa/ocelot/felix_vsc9959.c:1007 vsc9959_wm_enc+0x2c/0x40
> > [ 8.397729] Modules linked in:
> > [ 8.400800] CPU: 1 PID: 44 Comm: kworker/u4:2 Tainted: G W 5.17.0-rc2-07010-ga9b9500ffaac-dirty #2104
> > [ 8.411369] Hardware name: LS1028A RDB Board (DT)
> > [ 8.416092] Workqueue: events_unbound deferred_probe_work_func
> > [ 8.421955] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 8.428948] pc : vsc9959_wm_enc+0x2c/0x40
> > [ 8.432975] lr : ocelot_setup_sharing_watermarks+0xc0/0x1fc
> > [ 8.438573] sp : ffff800008863810
> > [ 8.441900] x29: ffff800008863810 x28: 0000000000000308 x27: 0000000000000001
> > [ 8.449081] x26: 000000000300001a x25: 0000000000000008 x24: 0000000000000080
> > [ 8.456262] x23: 0000000088888889 x22: 0000000000000000 x21: 0000000000000000
> > [ 8.463443] x20: 00000000ffff2d17 x19: ffff2d17039d8010 x18: ffffd8e2afa23b28
> > [ 8.470623] x17: 0000000000000015 x16: 0000000000000041 x15: 0000000000000000
> > [ 8.477804] x14: ffffd8e2afa49228 x13: 0000000000025700 x12: 0000000000000009
> > [ 8.484984] x11: ffff2d1703a96c18 x10: 0000000000000004 x9 : ffffd8e2ad6ce124
> > [ 8.492165] x8 : ffff2d1700be4240 x7 : ffffd8e2af981000 x6 : 0000000000000001
> > [ 8.499345] x5 : ffffd8e2ad1e1440 x4 : 0000000000000000 x3 : 0000000000000001
> > [ 8.506525] x2 : 0000000000000000 x1 : ffffd8e2ad3d2810 x0 : 00000000000040c0
> > [ 8.513705] Call trace:
> > [ 8.516161] vsc9959_wm_enc+0x2c/0x40
> > [ 8.519840] ocelot_devlink_sb_register+0x33c/0x380
> > [ 8.524740] felix_setup+0x438/0x750
> > [ 8.528331] dsa_register_switch+0x988/0x114c
> > [ 8.532708] felix_pci_probe+0x138/0x1fc
> > [ 8.536648] local_pci_probe+0x4c/0xc0
> > [ 8.540415] pci_device_probe+0x1b0/0x1f0
> > [ 8.544443] really_probe.part.0+0xa4/0x310
> > [ 8.548646] __driver_probe_device+0xa0/0x150
> > [ 8.553022] driver_probe_device+0xcc/0x164
> > [ 8.557225] __device_attach_driver+0xc4/0x130
> > [ 8.561688] bus_for_each_drv+0x84/0xe0
> > [ 8.565540] __device_attach+0xe4/0x190
> > [ 8.569393] device_initial_probe+0x20/0x30
> > [ 8.573594] bus_probe_device+0xac/0xb4
> > [ 8.577447] deferred_probe_work_func+0x98/0xd4
> > [ 8.581997] process_one_work+0x294/0x700
> > [ 8.586026] worker_thread+0x80/0x480
> > [ 8.589706] kthread+0x10c/0x120
> > [ 8.592949] ret_from_fork+0x10/0x20
> > [ 8.596541] irq event stamp: 50450
> > [ 8.599955] hardirqs last enabled at (50449): [<ffffd8e2ade442b0>] _raw_spin_unlock_irqrestore+0x90/0xb0
> > [ 8.609566] hardirqs last disabled at (50450): [<ffffd8e2ade36e44>] el1_dbg+0x24/0x90
> > [ 8.617431] softirqs last enabled at (50446): [<ffffd8e2ac6908f0>] __do_softirq+0x480/0x5f8
> > [ 8.625907] softirqs last disabled at (50435): [<ffffd8e2ac728e3c>] __irq_exit_rcu+0x17c/0x1b0
> > [ 8.634559] ---[ end trace 0000000000000000 ]---
> > [ 8.640586] device eth1 entered promiscuous mode
> > [ 8.645499] Unable to handle kernel paging request at virtual address 00000010400020bc
> > [ 8.653496] Mem abort info:
> > [ 8.656340] ESR = 0x96000044
> > [ 8.659413] EC = 0x25: DABT (current EL), IL = 32 bits
> > [ 8.664784] SET = 0, FnV = 0
> > [ 8.667855] EA = 0, S1PTW = 0
> > [ 8.671044] FSC = 0x04: level 0 translation fault
> > [ 8.675979] Data abort info:
> > [ 8.678875] ISV = 0, ISS = 0x00000044
> > [ 8.682762] CM = 0, WnR = 1
> > [ 8.685795] [00000010400020bc] user address but active_mm is swapper
> > [ 8.692272] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> > [ 8.697865] Modules linked in:
> > [ 8.700928] CPU: 1 PID: 44 Comm: kworker/u4:2 Tainted: G W 5.17.0-rc2-07010-ga9b9500ffaac-dirty #2104
> > [ 8.711490] Hardware name: LS1028A RDB Board (DT)
> > [ 8.716206] Workqueue: events_unbound deferred_probe_work_func
> > [ 8.722065] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 8.729051] pc : ocelot_phylink_mac_link_down+0x70/0x314
> > [ 8.734381] lr : felix_phylink_mac_link_down+0x24/0x30
> > [ 8.739536] sp : ffff800008863840
> > [ 8.742856] x29: ffff800008863840 x28: 0000000000000000 x27: ffffd8e2af8ef180
> > [ 8.750022] x26: 0000000000000001 x25: 0000001040002000 x24: 0000000000000000
> > [ 8.757187] x23: 0000000000000180 x22: ffff2d1703a92000 x21: 0000000000000004
> > [ 8.764352] x20: 0000000000000004 x19: ffff2d17039d8010 x18: ffffd8e2afa23b28
> > [ 8.771516] x17: 0000000000040006 x16: 0000000500030008 x15: ffffffffffffffff
> > [ 8.778680] x14: ffffffffffff0000 x13: ffffffffffffffff x12: 0000000000000010
> > [ 8.785845] x11: 0101010101010101 x10: 0000000000000004 x9 : ffffd8e2ad3d0814
> > [ 8.793009] x8 : fefefefefeff6a6d x7 : 0000ffffffffffff x6 : 0000000000000000
> > [ 8.800173] x5 : 00000000ffffffff x4 : 0000000000000001 x3 : 000000000d000007
> > [ 8.807338] x2 : 0000000000000010 x1 : 0000000000000000 x0 : 0000001040002000
> > [ 8.814502] Call trace:
> > [ 8.816949] ocelot_phylink_mac_link_down+0x70/0x314
> > [ 8.821929] felix_phylink_mac_link_down+0x24/0x30
> > [ 8.826734] dsa_port_link_register_of+0xa8/0x240
> > [ 8.831454] dsa_port_setup+0xb4/0x180
> > [ 8.835212] dsa_register_switch+0xdb4/0x114c
> > [ 8.839581] felix_pci_probe+0x138/0x1fc
> > [ 8.843515] local_pci_probe+0x4c/0xc0
> > [ 8.847275] pci_device_probe+0x1b0/0x1f0
> > [ 8.851296] really_probe.part.0+0xa4/0x310
> > [ 8.855490] __driver_probe_device+0xa0/0x150
> > [ 8.859860] driver_probe_device+0xcc/0x164
> > [ 8.864054] __device_attach_driver+0xc4/0x130
> > [ 8.868510] bus_for_each_drv+0x84/0xe0
> > [ 8.872355] __device_attach+0xe4/0x190
> > [ 8.876200] device_initial_probe+0x20/0x30
> > [ 8.880395] bus_probe_device+0xac/0xb4
> > [ 8.884240] deferred_probe_work_func+0x98/0xd4
> > [ 8.888783] process_one_work+0x294/0x700
> > [ 8.892808] worker_thread+0x80/0x480
> > [ 8.896480] kthread+0x10c/0x120
> > [ 8.899715] ret_from_fork+0x10/0x20
> > [ 8.903303] Code: 52800202 f874d839 52800001 aa1903e0 (b900bf25)
> > [ 8.909417] ---[ end trace 0000000000000000 ]---
> >
> > Investigating...
>
> I just did a sanity check on my latest tree and it doesn't crash. I'll
> keep an eye out here as well.
>
> Gives me an opportunity to fix up your other suggestion.
>
> Thanks for reviewing and testing Vladimir!

I don't know why this glitch happened. I couldn't reproduce this a
second time. Booting patch by patch with KASAN enabled also showed
nothing. And the stack traces don't make much sense either. I'll just
ignore it for now.

But anyway, I found some actual bugs in the code while testing, I'll
reply on the individual patch.