2021-02-20 04:36:13

by Yejune Deng

[permalink] [raw]
Subject: [PATCH] arp: Remove the arp_hh_ops structure

The 'arp_hh_ops' structure is similar to the 'arp_generic_ops' structure.
So remove the 'arp_hh_ops' structure.

Signed-off-by: Yejune Deng <[email protected]>
---
net/ipv4/arp.c | 19 +++----------------
1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 922dd73e5740..6d60d9b89286 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -135,14 +135,6 @@ static const struct neigh_ops arp_generic_ops = {
.connected_output = neigh_connected_output,
};

-static const struct neigh_ops arp_hh_ops = {
- .family = AF_INET,
- .solicit = arp_solicit,
- .error_report = arp_error_report,
- .output = neigh_resolve_output,
- .connected_output = neigh_resolve_output,
-};
-
static const struct neigh_ops arp_direct_ops = {
.family = AF_INET,
.output = neigh_direct_output,
@@ -277,15 +269,10 @@ static int arp_constructor(struct neighbour *neigh)
memcpy(neigh->ha, dev->broadcast, dev->addr_len);
}

- if (dev->header_ops->cache)
- neigh->ops = &arp_hh_ops;
- else
- neigh->ops = &arp_generic_ops;
-
- if (neigh->nud_state & NUD_VALID)
- neigh->output = neigh->ops->connected_output;
+ if (!dev->header_ops->cache && (neigh->nud_state & NUD_VALID))
+ neigh->output = arp_generic_ops.connected_output;
else
- neigh->output = neigh->ops->output;
+ neigh->output = arp_generic_ops.output;
}
return 0;
}
--
2.29.0


2021-02-21 01:59:47

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH] arp: Remove the arp_hh_ops structure

On 2/19/21 9:32 PM, Yejune Deng wrote:
> static const struct neigh_ops arp_direct_ops = {
> .family = AF_INET,
> .output = neigh_direct_output,
> @@ -277,15 +269,10 @@ static int arp_constructor(struct neighbour *neigh)
> memcpy(neigh->ha, dev->broadcast, dev->addr_len);
> }
>
> - if (dev->header_ops->cache)
> - neigh->ops = &arp_hh_ops;
> - else
> - neigh->ops = &arp_generic_ops;

How did you test this?

you took out the neigh->ops assignment, so all of the neigh->ops in
net/core/neighbour.c are going to cause a NULL dereference.


> -
> - if (neigh->nud_state & NUD_VALID)
> - neigh->output = neigh->ops->connected_output;
> + if (!dev->header_ops->cache && (neigh->nud_state & NUD_VALID))
> + neigh->output = arp_generic_ops.connected_output;
> else
> - neigh->output = neigh->ops->output;
> + neigh->output = arp_generic_ops.output;
> }
> return 0;
> }
>

2021-02-21 02:37:30

by Yejune Deng

[permalink] [raw]
Subject: Re: [PATCH] arp: Remove the arp_hh_ops structure

Sorry,it was my fault, I will resubmit.

On Sun, Feb 21, 2021 at 9:54 AM David Ahern <[email protected]> wrote:
>
> On 2/19/21 9:32 PM, Yejune Deng wrote:
> > static const struct neigh_ops arp_direct_ops = {
> > .family = AF_INET,
> > .output = neigh_direct_output,
> > @@ -277,15 +269,10 @@ static int arp_constructor(struct neighbour *neigh)
> > memcpy(neigh->ha, dev->broadcast, dev->addr_len);
> > }
> >
> > - if (dev->header_ops->cache)
> > - neigh->ops = &arp_hh_ops;
> > - else
> > - neigh->ops = &arp_generic_ops;
>
> How did you test this?
>
> you took out the neigh->ops assignment, so all of the neigh->ops in
> net/core/neighbour.c are going to cause a NULL dereference.
>
>
> > -
> > - if (neigh->nud_state & NUD_VALID)
> > - neigh->output = neigh->ops->connected_output;
> > + if (!dev->header_ops->cache && (neigh->nud_state & NUD_VALID))
> > + neigh->output = arp_generic_ops.connected_output;
> > else
> > - neigh->output = neigh->ops->output;
> > + neigh->output = arp_generic_ops.output;
> > }
> > return 0;
> > }
> >
>

2021-02-21 14:53:03

by Oliver Sang

[permalink] [raw]
Subject: [arp] 4591591ab7: RIP:neigh_probe


Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: 4591591ab715a4e480d58a726527f2cd252f3eb1 ("arp: Remove the arp_hh_ops structure")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git Yejune-Deng/arp-Remove-the-arp_hh_ops-structure/20210220-123704


in testcase: locktorture
version:
with following parameters:

runtime: 300s
test: cpuhotplug

test-description: This torture test consists of creating a number of kernel threads which acquire the lock and hold it for specific amount of time, thus simulating different critical region behaviors.
test-url: https://www.kernel.org/doc/Documentation/locking/locktorture.txt


on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+-------------------------------------------------------+------------+------------+
| | 38b5133ad6 | 4591591ab7 |
+-------------------------------------------------------+------------+------------+
| boot_successes | 12 | 0 |
| boot_failures | 0 | 12 |
| BUG:kernel_NULL_pointer_dereference,address | 0 | 11 |
| Oops:#[##] | 0 | 12 |
| RIP:neigh_probe | 0 | 12 |
| Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 0 | 12 |
+-------------------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>



[ OK ] Reached target System Time Synchronized.
[ OK ] Reached target System Initialization.
8.973653] #PF: supervisor read access in kernel mode
[ 8.975027] #PF: error_code(0x0000) - not-present page
[ 8.976310] PGD 0 P4D 0
[ 8.977036] Oops: 0000 [#1] SMP PTI
[ 8.977973] CPU: 1 PID: 210 Comm: sd-resolve Not tainted 5.11.0-rc7-02046-g4591591ab715 #1
[ 8.979998] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 8.981996] RIP: 0010:neigh_probe (kbuild/src/consumer/net/core/neighbour.c:1009)
[ 8.983081] Code: 43 38 48 39 c7 74 0f be 20 0a 00 00 e8 ba d4 fd ff 48 89 c5 eb 02 31 ed c6 43 28 00 48 8b 83 50 01 00 00 65 ff 0d 81 85 87 7d <48> 8b 40 08 48 85 c0 74 0b 48 89 ee 48 89 df e8 ad 43 46 00 f0 ff
All code
========
0: 43 38 48 39 rex.XB cmp %cl,0x39(%r8)
4: c7 (bad)
5: 74 0f je 0x16
7: be 20 0a 00 00 mov $0xa20,%esi
c: e8 ba d4 fd ff callq 0xfffffffffffdd4cb
11: 48 89 c5 mov %rax,%rbp
14: eb 02 jmp 0x18
16: 31 ed xor %ebp,%ebp
18: c6 43 28 00 movb $0x0,0x28(%rbx)
1c: 48 8b 83 50 01 00 00 mov 0x150(%rbx),%rax
23: 65 ff 0d 81 85 87 7d decl %gs:0x7d878581(%rip) # 0x7d8785ab
2a:* 48 8b 40 08 mov 0x8(%rax),%rax <-- trapping instruction
2e: 48 85 c0 test %rax,%rax
31: 74 0b je 0x3e
33: 48 89 ee mov %rbp,%rsi
36: 48 89 df mov %rbx,%rdi
39: e8 ad 43 46 00 callq 0x4643eb
3e: f0 lock
3f: ff .byte 0xff

Code starting with the faulting instruction
===========================================
0: 48 8b 40 08 mov 0x8(%rax),%rax
4: 48 85 c0 test %rax,%rax
7: 74 0b je 0x14
9: 48 89 ee mov %rbp,%rsi
c: 48 89 df mov %rbx,%rdi
f: e8 ad 43 46 00 callq 0x4643c1
14: f0 lock
15: ff .byte 0xff
[ 8.987379] RSP: 0018:ffff9ef6806078f0 EFLAGS: 00010286
[ 8.988690] RAX: 0000000000000000 RBX: ffff8913462b6c00 RCX: 0000000000000000
[ 8.990407] RDX: 0000000000000000 RSI: ffff891346125800 RDI: ffff891346125600
[ 8.992143] RBP: ffff891346125600 R08: 0000000000000100 R09: ffffffff83980870
[ 8.993682] R10: ffff8913468ee818 R11: 0000000000000002 R12: 0000000000000001
[ 8.995282] R13: ffff891346125800 R14: ffff891346835e70 R15: 000000000000002f
[ 8.996959] FS: 00007f354281f700(0000) GS:ffff8913ffd00000(0000) knlGS:0000000000000000
[ 8.998964] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.000365] CR2: 0000000000000008 CR3: 000000013630c000 CR4: 00000000000406e0
[ 9.002077] Call Trace:
[ 9.002823] __neigh_event_send (kbuild/src/consumer/net/core/neighbour.c:1171)
[ 9.003849] neigh_resolve_output (kbuild/src/consumer/net/core/neighbour.c:1475)
[ 9.004888] ip_finish_output2 (kbuild/src/consumer/include/net/neighbour.h:510 kbuild/src/consumer/net/ipv4/ip_output.c:230)
[ 9.005886] ip_output (kbuild/src/consumer/net/ipv4/ip_output.c:436)
[ 9.006820] ? __ip_finish_output (kbuild/src/consumer/net/ipv4/ip_output.c:312)


To reproduce:

# build kernel
cd linux
cp config-5.11.0-rc7-02046-g4591591ab715 .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email



Thanks,
Oliver Sang


Attachments:
(No filename) (5.63 kB)
config-5.11.0-rc7-02046-g4591591ab715 (175.24 kB)
job-script (4.73 kB)
dmesg.xz (13.29 kB)
Download all attachments