On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote:
> During DPC error injection test we found there is race condition between
> pciehp and DPC driver, null pointer reference caused panic as following
null -> NULL
>
> # setpci -s 64:02.0 0x196.w=000a
> // 64:02.0 is rootport has DPC capability
> # setpci -s 65:00.0 0x04.w=0544
> // 65:00.0 is NVMe SSD populated in above port
> # mount /dev/nvme0n1p1 nvme
>
> (tested on stable 5.8 & ICX platform)
>
> Buffer I/O error on dev nvme0n1p1, logical block 468843328,
> async page read
> BUG: kernel NULL pointer dereference, address: 0000000000000050
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
Same comment about Oops.
--
With Best Regards,
Andy Shevchenko
Preferred style, there will be cleared comment in v6.
Thanks,
Ethan
On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko
<[email protected]> wrote:
>
> On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote:
> > During DPC error injection test we found there is race condition between
> > pciehp and DPC driver, null pointer reference caused panic as following
>
> null -> NULL
>
> >
> > # setpci -s 64:02.0 0x196.w=000a
> > // 64:02.0 is rootport has DPC capability
> > # setpci -s 65:00.0 0x04.w=0544
> > // 65:00.0 is NVMe SSD populated in above port
> > # mount /dev/nvme0n1p1 nvme
> >
> > (tested on stable 5.8 & ICX platform)
> >
> > Buffer I/O error on dev nvme0n1p1, logical block 468843328,
> > async page read
> > BUG: kernel NULL pointer dereference, address: 0000000000000050
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
>
> Same comment about Oops.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>
On Tue, Sep 29, 2020 at 10:35:14AM +0800, Ethan Zhao wrote:
> Preferred style, there will be cleared comment in v6.
Avoid top postings.
> On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko
> <[email protected]> wrote:
> >
> > On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote:
> > > During DPC error injection test we found there is race condition between
> > > pciehp and DPC driver, null pointer reference caused panic as following
> >
> > null -> NULL
> >
> > >
> > > # setpci -s 64:02.0 0x196.w=000a
> > > // 64:02.0 is rootport has DPC capability
> > > # setpci -s 65:00.0 0x04.w=0544
> > > // 65:00.0 is NVMe SSD populated in above port
> > > # mount /dev/nvme0n1p1 nvme
> > >
> > > (tested on stable 5.8 & ICX platform)
> > >
> > > Buffer I/O error on dev nvme0n1p1, logical block 468843328,
> > > async page read
> > > BUG: kernel NULL pointer dereference, address: 0000000000000050
> > > #PF: supervisor read access in kernel mode
> > > #PF: error_code(0x0000) - not-present page
> >
> > Same comment about Oops.
In another thread it was a good advice to move the full Oops (if you think it's
very useful to have) after the cutter '---' line, so it will be in email
archives but Git history.
--
With Best Regards,
Andy Shevchenko
Andy,
On Tue, Sep 29, 2020 at 4:51 PM Andy Shevchenko
<[email protected]> wrote:
>
> On Tue, Sep 29, 2020 at 10:35:14AM +0800, Ethan Zhao wrote:
> > Preferred style, there will be cleared comment in v6.
>
> Avoid top postings.
>
> > On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko
> > <[email protected]> wrote:
> > >
> > > On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote:
> > > > During DPC error injection test we found there is race condition between
> > > > pciehp and DPC driver, null pointer reference caused panic as following
> > >
> > > null -> NULL
> > >
> > > >
> > > > # setpci -s 64:02.0 0x196.w=000a
> > > > // 64:02.0 is rootport has DPC capability
> > > > # setpci -s 65:00.0 0x04.w=0544
> > > > // 65:00.0 is NVMe SSD populated in above port
> > > > # mount /dev/nvme0n1p1 nvme
> > > >
> > > > (tested on stable 5.8 & ICX platform)
> > > >
> > > > Buffer I/O error on dev nvme0n1p1, logical block 468843328,
> > > > async page read
> > > > BUG: kernel NULL pointer dereference, address: 0000000000000050
> > > > #PF: supervisor read access in kernel mode
> > > > #PF: error_code(0x0000) - not-present page
> > >
> > > Same comment about Oops.
>
> In another thread it was a good advice to move the full Oops (if you think it's
> very useful to have) after the cutter '---' line, so it will be in email
> archives but Git history.
So git history wouldn't give any of the Oops context, and he/she has
to access LKML,
if offline, then ...lost.
Thanks,
Ethan
>
> --
> With Best Regards,
> Andy Shevchenko
>
>
On Tue, Sep 29, 2020 at 05:38:00PM +0800, Ethan Zhao wrote:
> On Tue, Sep 29, 2020 at 4:51 PM Andy Shevchenko
> <[email protected]> wrote:
> > On Tue, Sep 29, 2020 at 10:35:14AM +0800, Ethan Zhao wrote:
> > > Preferred style, there will be cleared comment in v6.
> >
> > Avoid top postings.
> >
> > > On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko
> > > <[email protected]> wrote:
> > > > On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote:
...
> > > > > Buffer I/O error on dev nvme0n1p1, logical block 468843328,
> > > > > async page read
> > > > > BUG: kernel NULL pointer dereference, address: 0000000000000050
> > > > > #PF: supervisor read access in kernel mode
> > > > > #PF: error_code(0x0000) - not-present page
> > > >
> > > > Same comment about Oops.
> >
> > In another thread it was a good advice to move the full Oops (if you think it's
> > very useful to have) after the cutter '---' line, so it will be in email
> > archives but Git history.
>
> So git history wouldn't give any of the Oops context, and he/she has
> to access LKML,
> if offline, then ...lost.
Tell me, do you really think the line:
#PF: error_code(0x0000) - not-present page
makes any sense in the commit message?
I do not think so. And so on, for almost 60% of the Oops.
Really, to help one person you will make millions suffering. It's not okay.
--
With Best Regards,
Andy Shevchenko
On Tue, Sep 29, 2020 at 6:48 PM Andy Shevchenko
<[email protected]> wrote:
>
> On Tue, Sep 29, 2020 at 05:38:00PM +0800, Ethan Zhao wrote:
> > On Tue, Sep 29, 2020 at 4:51 PM Andy Shevchenko
> > <[email protected]> wrote:
> > > On Tue, Sep 29, 2020 at 10:35:14AM +0800, Ethan Zhao wrote:
> > > > Preferred style, there will be cleared comment in v6.
> > >
> > > Avoid top postings.
> > >
> > > > On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko
> > > > <[email protected]> wrote:
> > > > > On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote:
>
> ...
>
> > > > > > Buffer I/O error on dev nvme0n1p1, logical block 468843328,
> > > > > > async page read
> > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000050
> > > > > > #PF: supervisor read access in kernel mode
> > > > > > #PF: error_code(0x0000) - not-present page
> > > > >
> > > > > Same comment about Oops.
> > >
> > > In another thread it was a good advice to move the full Oops (if you think it's
> > > very useful to have) after the cutter '---' line, so it will be in email
> > > archives but Git history.
> >
> > So git history wouldn't give any of the Oops context, and he/she has
> > to access LKML,
> > if offline, then ...lost.
>
> Tell me, do you really think the line:
> #PF: error_code(0x0000) - not-present page
> makes any sense in the commit message?
>
> I do not think so. And so on, for almost 60% of the Oops.
> Really, to help one person you will make millions suffering. It's not okay.
>
If you and millions feel so suffered, why not try to simplify the
Oops code to not
output nonsense too much to console and log. :)
They might not be so important to old birds as you. but it is easy to cut off
the road for newcomers.
Anyway, it is not the focus of this patchset. help to take a look and
try the code.
Thanks,
Ethan
> --
> With Best Regards,
> Andy Shevchenko
>
>