My Powerbook G4 Aluminum generates a fatal splat early in the boot process, just
after identifying the driver for the disk. Unfortunately, it turns off almost
immediately, thus I cannot report the message. After this bug has been
triggered, the system clock has been reset to Dec. 31, 1969. I assume this is a
side effect of an uncontrolled DMA operation.
This problem has been bisected to commit 5657933dbb6e ("treewide: Move dma_ops
from struct dev_archdata into struct device").
Larry
On Wed, 2017-03-01 at 21:26 -0600, Larry Finger wrote:
> My Powerbook G4 Aluminum generates a fatal splat early in the boot process, just
> after identifying the driver for the disk. Unfortunately, it turns off almost
> immediately, thus I cannot report the message. After this bug has been
> triggered, the system clock has been reset to Dec. 31, 1969. I assume this is a
> side effect of an uncontrolled DMA operation.
>
> This problem has been bisected to commit 5657933dbb6e ("treewide: Move dma_ops
> from struct dev_archdata into struct device").
Can you provide the kernel .config of that G4?
Thanks,
Bart.
On Wed, 2017-03-01 at 21:26 -0600, Larry Finger wrote:
> My Powerbook G4 Aluminum generates a fatal splat early in the boot process, just
> after identifying the driver for the disk. Unfortunately, it turns off almost
> immediately, thus I cannot report the message. After this bug has been
> triggered, the system clock has been reset to Dec. 31, 1969. I assume this is a
> side effect of an uncontrolled DMA operation.
>
> This problem has been bisected to commit 5657933dbb6e ("treewide: Move dma_ops
> from struct dev_archdata into struct device").
Side effect of a crash during boot... the PMU gets upset when we crash while
there's a request in flight, that's probably what is happening.
As to why that commit is broken, I don't have time to look into it right now,
maybe next week of nobody beats me to it.
Cheers,
Ben.
On Thu, 2017-03-02 at 16:14 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2017-03-01 at 21:26 -0600, Larry Finger wrote:
> > My Powerbook G4 Aluminum generates a fatal splat early in the boot process, just
> > after identifying the driver for the disk. Unfortunately, it turns off almost
> > immediately, thus I cannot report the message. After this bug has been
> > triggered, the system clock has been reset to Dec. 31, 1969. I assume this is a
> > side effect of an uncontrolled DMA operation.
> >
> > This problem has been bisected to commit 5657933dbb6e ("treewide: Move dma_ops
> > from struct dev_archdata into struct device").
>
> Side effect of a crash during boot... the PMU gets upset when we crash while
> there's a request in flight, that's probably what is happening.
>
> As to why that commit is broken, I don't have time to look into it right now,
> maybe next week of nobody beats me to it.
Hello Ben,
Thanks. I will try to reproduce what has been reported with qemu-ppc on Friday to
double check that it's really commit 5657933dbb6e that is causing this. I reread it
but couldn't find any reason why it works on x86-64 but not on powerpc. BTW, the
following patch is needed on at least s390 but probably also on powerpc to restore
InfiniBand support: http://marc.info/?l=linux-rdma&m=148823342415501&w=2. This is
why I asked for the kernel config.
Bart.
On 03/01/2017 11:22 PM, Bart Van Assche wrote:
> On Thu, 2017-03-02 at 16:14 +1100, Benjamin Herrenschmidt wrote:
>> On Wed, 2017-03-01 at 21:26 -0600, Larry Finger wrote:
>>> My Powerbook G4 Aluminum generates a fatal splat early in the boot process, just
>>> after identifying the driver for the disk. Unfortunately, it turns off almost
>>> immediately, thus I cannot report the message. After this bug has been
>>> triggered, the system clock has been reset to Dec. 31, 1969. I assume this is a
>>> side effect of an uncontrolled DMA operation.
>>>
>>> This problem has been bisected to commit 5657933dbb6e ("treewide: Move dma_ops
>>> from struct dev_archdata into struct device").
>>
>> Side effect of a crash during boot... the PMU gets upset when we crash while
>> there's a request in flight, that's probably what is happening.
>>
>> As to why that commit is broken, I don't have time to look into it right now,
>> maybe next week of nobody beats me to it.
>
> Hello Ben,
>
> Thanks. I will try to reproduce what has been reported with qemu-ppc on Friday to
> double check that it's really commit 5657933dbb6e that is causing this. I reread it
> but couldn't find any reason why it works on x86-64 but not on powerpc. BTW, the
> following patch is needed on at least s390 but probably also on powerpc to restore
> InfiniBand support: http://marc.info/?l=linux-rdma&m=148823342415501&w=2. This is
> why I asked for the kernel config.
I am very confident of the bisection. A kernel built from commit 5299709, which
is just before 5657933, boots correctly.
As you have probably seen, my configuration does not include InfiniBand support.
Larry
On 03/01/2017 10:07 PM, Bart Van Assche wrote:
> On Wed, 2017-03-01 at 21:26 -0600, Larry Finger wrote:
>> My Powerbook G4 Aluminum generates a fatal splat early in the boot process, just
>> after identifying the driver for the disk. Unfortunately, it turns off almost
>> immediately, thus I cannot report the message. After this bug has been
>> triggered, the system clock has been reset to Dec. 31, 1969. I assume this is a
>> side effect of an uncontrolled DMA operation.
>>
>> This problem has been bisected to commit 5657933dbb6e ("treewide: Move dma_ops
>> from struct dev_archdata into struct device").
>
> Can you provide the kernel .config of that G4?
Attached.
Larry
On Thu, 2017-03-02 at 16:14 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2017-03-01 at 21:26 -0600, Larry Finger wrote:
> > My Powerbook G4 Aluminum generates a fatal splat early in the boot process, just?
> > after identifying the driver for the disk. Unfortunately, it turns off almost?
> > immediately, thus I cannot report the message. After this bug has been?
> > triggered, the system clock has been reset to Dec. 31, 1969. I assume this is a?
> > side effect of an uncontrolled DMA operation.
> >
> > This problem has been bisected to commit 5657933dbb6e ("treewide: Move dma_ops?
> > from struct dev_archdata into struct device").
>
> Side effect of a crash during boot... the PMU gets upset when we crash while
> there's a request in flight, that's probably what is happening.
>
> As to why that commit is broken, I don't have time to look into it right now,
> maybe next week of nobody beats me to it.
The results of my attempts so far to create a PPC VM:
* I have not found a CPU / architecture / CD bus type / combination for which the
Gentoo installation CD was able to recognize the boot medium
(http://distfiles.gentoo.org/releases/ppc/autobuilds/20140713/install-powerpc-minimal-20140713.iso).
* Same problem with the openSUSE Leap CD: apparently that software does not recognize
the qemu SCSI CD
(http://mirror.datto.com/opensuse/ports/ppc/distribution/leap/42.2/iso/openSUSE-Leap-42.2-NET-ppc64le-Build0156-Media.iso).
* openSUSE Tumbleweed recognizes the qemu SCSI CD. After it has loaded the installation
software however it displays an message reporting that an error occurred during
installation and displays a menu. Making a selection from that menu is not possible
because the only key from the keyboard it recognizes is "Enter"
(http://mirror.datto.com/opensuse/ports/ppc/tumbleweed/iso/openSUSE-Tumbleweed-DVD-ppc64-Snapshot20170303-Media.iso).
Does anyone have a suggestion for how to proceed?
Thanks,
Bart.
On 03/06/2017 12:02 PM, Bart Van Assche wrote:
> On Thu, 2017-03-02 at 16:14 +1100, Benjamin Herrenschmidt wrote:
>> On Wed, 2017-03-01 at 21:26 -0600, Larry Finger wrote:
>>> My Powerbook G4 Aluminum generates a fatal splat early in the boot process, just
>>> after identifying the driver for the disk. Unfortunately, it turns off almost
>>> immediately, thus I cannot report the message. After this bug has been
>>> triggered, the system clock has been reset to Dec. 31, 1969. I assume this is a
>>> side effect of an uncontrolled DMA operation.
>>>
>>> This problem has been bisected to commit 5657933dbb6e ("treewide: Move dma_ops
>>> from struct dev_archdata into struct device").
>>
>> Side effect of a crash during boot... the PMU gets upset when we crash while
>> there's a request in flight, that's probably what is happening.
>>
>> As to why that commit is broken, I don't have time to look into it right now,
>> maybe next week of nobody beats me to it.
>
> The results of my attempts so far to create a PPC VM:
> * I have not found a CPU / architecture / CD bus type / combination for which the
> Gentoo installation CD was able to recognize the boot medium
> (http://distfiles.gentoo.org/releases/ppc/autobuilds/20140713/install-powerpc-minimal-20140713.iso).
> * Same problem with the openSUSE Leap CD: apparently that software does not recognize
> the qemu SCSI CD
> (http://mirror.datto.com/opensuse/ports/ppc/distribution/leap/42.2/iso/openSUSE-Leap-42.2-NET-ppc64le-Build0156-Media.iso).
> * openSUSE Tumbleweed recognizes the qemu SCSI CD. After it has loaded the installation
> software however it displays an message reporting that an error occurred during
> installation and displays a menu. Making a selection from that menu is not possible
> because the only key from the keyboard it recognizes is "Enter"
> (http://mirror.datto.com/opensuse/ports/ppc/tumbleweed/iso/openSUSE-Tumbleweed-DVD-ppc64-Snapshot20170303-Media.iso).
>
> Does anyone have a suggestion for how to proceed?
I was able to create a PPC emulation with debian-8.7.1-powerpc-CD-1.iso
following the instructions in https://gmplib.org/~tege/qemu.html. My only
problem was that "-net tap" fails and I did not find any way to get networking
working.
After looking at the screen through a number of crashes, I have determined that
the top entry in the traceback comes from dmam_alloc_coherent(). I have not been
able to see the offset to determine which BUG_ON call in that routine is being
triggered.
I tried to modify panic() to see if I could keep the screen on longer after the
failure, but no joy so far.
Larry
On Mon, 2017-03-06 at 13:46 -0600, Larry Finger wrote:
> I was able to create a PPC emulation with debian-8.7.1-powerpc-CD-1.iso
> > following the instructions in https://gmplib.org/~tege/qemu.html. My only
> problem was that "-net tap" fails and I did not find any way to get networking
> working.
>
> After looking at the screen through a number of crashes, I have determined that
> the top entry in the traceback comes from dmam_alloc_coherent(). I have not been
> able to see the offset to determine which BUG_ON call in that routine is being
> triggered.
>
> I tried to modify panic() to see if I could keep the screen on longer after the
> failure, but no joy so far.
I think the problem is this code in drivers/macintosh/macio_asic.c:
#ifdef CONFIG_PCI
/* Set the DMA ops to the ones from the PCI device, this could be
* fishy if we didn't know that on PowerMac it's always direct ops
* or iommu ops that will work fine
*
* To get all the fields, copy all archdata
*/
dev->ofdev.dev.archdata = chip->lbus.pdev->dev.archdata;
#endif /* CONFIG_PCI */
This is definitely bad. A quick fix is to copy the new dev->dma_ops field
(as well, there is still stuff in archdata that we need too).
A better long term fix is to have a set of macio_dma_ops wrappers that do
"the right thing".
Cheers,
Ben.
On 03/06/2017 03:48 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2017-03-06 at 13:46 -0600, Larry Finger wrote:
>> I was able to create a PPC emulation with debian-8.7.1-powerpc-CD-1.iso
>>> following the instructions in https://gmplib.org/~tege/qemu.html. My only
>> problem was that "-net tap" fails and I did not find any way to get networking
>> working.
>>
>> After looking at the screen through a number of crashes, I have determined that
>> the top entry in the traceback comes from dmam_alloc_coherent(). I have not been
>> able to see the offset to determine which BUG_ON call in that routine is being
>> triggered.
>>
>> I tried to modify panic() to see if I could keep the screen on longer after the
>> failure, but no joy so far.
>
> I think the problem is this code in drivers/macintosh/macio_asic.c:
>
> #ifdef CONFIG_PCI
> /* Set the DMA ops to the ones from the PCI device, this could be
> * fishy if we didn't know that on PowerMac it's always direct ops
> * or iommu ops that will work fine
> *
> * To get all the fields, copy all archdata
> */
> dev->ofdev.dev.archdata = chip->lbus.pdev->dev.archdata;
> #endif /* CONFIG_PCI */
>
> This is definitely bad. A quick fix is to copy the new dev->dma_ops field
> (as well, there is still stuff in archdata that we need too).
>
> A better long term fix is to have a set of macio_dma_ops wrappers that do
> "the right thing".
The one-line fix that copies dma_ops does indeed fix the problem.
What do you want to do from here? I could prepare a q & d patch to resolve the
regression, or would you prefer to do "the right thing" now?
Larry
On 03/06/2017 03:48 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2017-03-06 at 13:46 -0600, Larry Finger wrote:
>> I was able to create a PPC emulation with debian-8.7.1-powerpc-CD-1.iso
>>> following the instructions in https://gmplib.org/~tege/qemu.html. My only
>> problem was that "-net tap" fails and I did not find any way to get networking
>> working.
>>
>> After looking at the screen through a number of crashes, I have determined that
>> the top entry in the traceback comes from dmam_alloc_coherent(). I have not been
>> able to see the offset to determine which BUG_ON call in that routine is being
>> triggered.
>>
>> I tried to modify panic() to see if I could keep the screen on longer after the
>> failure, but no joy so far.
>
> I think the problem is this code in drivers/macintosh/macio_asic.c:
>
> #ifdef CONFIG_PCI
> /* Set the DMA ops to the ones from the PCI device, this could be
> * fishy if we didn't know that on PowerMac it's always direct ops
> * or iommu ops that will work fine
> *
> * To get all the fields, copy all archdata
> */
> dev->ofdev.dev.archdata = chip->lbus.pdev->dev.archdata;
> #endif /* CONFIG_PCI */
>
> This is definitely bad. A quick fix is to copy the new dev->dma_ops field
> (as well, there is still stuff in archdata that we need too).
>
> A better long term fix is to have a set of macio_dma_ops wrappers that do
> "the right thing".
Ben,
Attached is a patch that fixes the crash. At the moment, it has my s-o-b, but I
do not feel it right to claim authorship. My role should be a
Reported-and-tested-by. Please advise.
Larry
On Thu, 2017-03-09 at 16:22 -0600, Larry Finger wrote:
>
>
> Attached is a patch that fixes the crash. At the moment, it has my s-
> o-b, but I
> do not feel it right to claim authorship. My role should be a
> Reported-and-tested-by. Please advise.
Nah, you wrote the patch, your s-o-b is fine ;-) Can you submit it
properly ? Ie, an email with the patch not an attachment and
the [PATCH] ... subject line ?
Cheers,
Ben.