2020-09-25 16:44:41

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference

On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote:
> During DPC error injection test we found there is race condition between
> pciehp and DPC driver, null pointer reference caused panic as following

null -> NULL

>
> # setpci -s 64:02.0 0x196.w=000a
> // 64:02.0 is rootport has DPC capability
> # setpci -s 65:00.0 0x04.w=0544
> // 65:00.0 is NVMe SSD populated in above port
> # mount /dev/nvme0n1p1 nvme
>
> (tested on stable 5.8 & ICX platform)
>
> Buffer I/O error on dev nvme0n1p1, logical block 468843328,
> async page read
> BUG: kernel NULL pointer dereference, address: 0000000000000050
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page

Same comment about Oops.

--
With Best Regards,
Andy Shevchenko



2020-09-29 02:37:59

by Ethan Zhao

[permalink] [raw]
Subject: Re: [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference

Preferred style, there will be cleared comment in v6.

Thanks,
Ethan

On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko
<[email protected]> wrote:
>
> On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote:
> > During DPC error injection test we found there is race condition between
> > pciehp and DPC driver, null pointer reference caused panic as following
>
> null -> NULL
>
> >
> > # setpci -s 64:02.0 0x196.w=000a
> > // 64:02.0 is rootport has DPC capability
> > # setpci -s 65:00.0 0x04.w=0544
> > // 65:00.0 is NVMe SSD populated in above port
> > # mount /dev/nvme0n1p1 nvme
> >
> > (tested on stable 5.8 & ICX platform)
> >
> > Buffer I/O error on dev nvme0n1p1, logical block 468843328,
> > async page read
> > BUG: kernel NULL pointer dereference, address: 0000000000000050
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
>
> Same comment about Oops.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>

2020-09-29 08:52:35

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference

On Tue, Sep 29, 2020 at 10:35:14AM +0800, Ethan Zhao wrote:
> Preferred style, there will be cleared comment in v6.

Avoid top postings.

> On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko
> <[email protected]> wrote:
> >
> > On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote:
> > > During DPC error injection test we found there is race condition between
> > > pciehp and DPC driver, null pointer reference caused panic as following
> >
> > null -> NULL
> >
> > >
> > > # setpci -s 64:02.0 0x196.w=000a
> > > // 64:02.0 is rootport has DPC capability
> > > # setpci -s 65:00.0 0x04.w=0544
> > > // 65:00.0 is NVMe SSD populated in above port
> > > # mount /dev/nvme0n1p1 nvme
> > >
> > > (tested on stable 5.8 & ICX platform)
> > >
> > > Buffer I/O error on dev nvme0n1p1, logical block 468843328,
> > > async page read
> > > BUG: kernel NULL pointer dereference, address: 0000000000000050
> > > #PF: supervisor read access in kernel mode
> > > #PF: error_code(0x0000) - not-present page
> >
> > Same comment about Oops.

In another thread it was a good advice to move the full Oops (if you think it's
very useful to have) after the cutter '---' line, so it will be in email
archives but Git history.

--
With Best Regards,
Andy Shevchenko


2020-09-29 09:39:47

by Ethan Zhao

[permalink] [raw]
Subject: Re: [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference

Andy,

On Tue, Sep 29, 2020 at 4:51 PM Andy Shevchenko
<[email protected]> wrote:
>
> On Tue, Sep 29, 2020 at 10:35:14AM +0800, Ethan Zhao wrote:
> > Preferred style, there will be cleared comment in v6.
>
> Avoid top postings.
>
> > On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko
> > <[email protected]> wrote:
> > >
> > > On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote:
> > > > During DPC error injection test we found there is race condition between
> > > > pciehp and DPC driver, null pointer reference caused panic as following
> > >
> > > null -> NULL
> > >
> > > >
> > > > # setpci -s 64:02.0 0x196.w=000a
> > > > // 64:02.0 is rootport has DPC capability
> > > > # setpci -s 65:00.0 0x04.w=0544
> > > > // 65:00.0 is NVMe SSD populated in above port
> > > > # mount /dev/nvme0n1p1 nvme
> > > >
> > > > (tested on stable 5.8 & ICX platform)
> > > >
> > > > Buffer I/O error on dev nvme0n1p1, logical block 468843328,
> > > > async page read
> > > > BUG: kernel NULL pointer dereference, address: 0000000000000050
> > > > #PF: supervisor read access in kernel mode
> > > > #PF: error_code(0x0000) - not-present page
> > >
> > > Same comment about Oops.
>
> In another thread it was a good advice to move the full Oops (if you think it's
> very useful to have) after the cutter '---' line, so it will be in email
> archives but Git history.

So git history wouldn't give any of the Oops context, and he/she has
to access LKML,
if offline, then ...lost.

Thanks,
Ethan
>
> --
> With Best Regards,
> Andy Shevchenko
>
>

2020-09-29 10:50:47

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference

On Tue, Sep 29, 2020 at 05:38:00PM +0800, Ethan Zhao wrote:
> On Tue, Sep 29, 2020 at 4:51 PM Andy Shevchenko
> <[email protected]> wrote:
> > On Tue, Sep 29, 2020 at 10:35:14AM +0800, Ethan Zhao wrote:
> > > Preferred style, there will be cleared comment in v6.
> >
> > Avoid top postings.
> >
> > > On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko
> > > <[email protected]> wrote:
> > > > On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote:

...

> > > > > Buffer I/O error on dev nvme0n1p1, logical block 468843328,
> > > > > async page read
> > > > > BUG: kernel NULL pointer dereference, address: 0000000000000050
> > > > > #PF: supervisor read access in kernel mode
> > > > > #PF: error_code(0x0000) - not-present page
> > > >
> > > > Same comment about Oops.
> >
> > In another thread it was a good advice to move the full Oops (if you think it's
> > very useful to have) after the cutter '---' line, so it will be in email
> > archives but Git history.
>
> So git history wouldn't give any of the Oops context, and he/she has
> to access LKML,
> if offline, then ...lost.

Tell me, do you really think the line:
#PF: error_code(0x0000) - not-present page
makes any sense in the commit message?

I do not think so. And so on, for almost 60% of the Oops.
Really, to help one person you will make millions suffering. It's not okay.

--
With Best Regards,
Andy Shevchenko


2020-09-30 02:08:56

by Ethan Zhao

[permalink] [raw]
Subject: Re: [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference

On Tue, Sep 29, 2020 at 6:48 PM Andy Shevchenko
<[email protected]> wrote:
>
> On Tue, Sep 29, 2020 at 05:38:00PM +0800, Ethan Zhao wrote:
> > On Tue, Sep 29, 2020 at 4:51 PM Andy Shevchenko
> > <[email protected]> wrote:
> > > On Tue, Sep 29, 2020 at 10:35:14AM +0800, Ethan Zhao wrote:
> > > > Preferred style, there will be cleared comment in v6.
> > >
> > > Avoid top postings.
> > >
> > > > On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko
> > > > <[email protected]> wrote:
> > > > > On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote:
>
> ...
>
> > > > > > Buffer I/O error on dev nvme0n1p1, logical block 468843328,
> > > > > > async page read
> > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000050
> > > > > > #PF: supervisor read access in kernel mode
> > > > > > #PF: error_code(0x0000) - not-present page
> > > > >
> > > > > Same comment about Oops.
> > >
> > > In another thread it was a good advice to move the full Oops (if you think it's
> > > very useful to have) after the cutter '---' line, so it will be in email
> > > archives but Git history.
> >
> > So git history wouldn't give any of the Oops context, and he/she has
> > to access LKML,
> > if offline, then ...lost.
>
> Tell me, do you really think the line:
> #PF: error_code(0x0000) - not-present page
> makes any sense in the commit message?
>
> I do not think so. And so on, for almost 60% of the Oops.
> Really, to help one person you will make millions suffering. It's not okay.
>
If you and millions feel so suffered, why not try to simplify the
Oops code to not
output nonsense too much to console and log. :)

They might not be so important to old birds as you. but it is easy to cut off
the road for newcomers.

Anyway, it is not the focus of this patchset. help to take a look and
try the code.


Thanks,
Ethan

> --
> With Best Regards,
> Andy Shevchenko
>
>