2005-03-15 10:38:35

by Dave Airlie

[permalink] [raw]
Subject: drm lockups since 2.6.11-bk2


Hi all,
Andrew Clayton reported lockups on the dri list issues since -bk2
and bug 4337 on bugzilla.kernel.org looks like it might be the same
thing..

This leads me to think the AGP multi-bridge patches are at fault... (for
once my laziness in merging late instead of early gave a good gap in the
patches...)

I'm "offline" in sense of I can write this mail and respond but have not
access to a Linux system, my bitkeeper trees, ssh keys for anywhere of
interest.. and am in the wrong country, it'll be the 23rd/24th before I am
back at my desks...

I might get time to do a code review, my main worry is that all the
problems reported with those patches in -mm made it into the patchset that
went into Linus.. mainly things like forgetting to memset certain
structures to 0 and sillies like that...

Dave.

--
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
Linux kernel - DRI, VAX / pam_smb / ILUG


2005-03-15 14:37:00

by Dave Jones

[permalink] [raw]
Subject: Re: drm lockups since 2.6.11-bk2

On Tue, Mar 15, 2005 at 10:38:30AM +0000, Dave Airlie wrote:
>
> Hi all,
> Andrew Clayton reported lockups on the dri list issues since -bk2
> and bug 4337 on bugzilla.kernel.org looks like it might be the same
> thing..
>
> This leads me to think the AGP multi-bridge patches are at fault... (for
> once my laziness in merging late instead of early gave a good gap in the
> patches...)
>
> I'm "offline" in sense of I can write this mail and respond but have not
> access to a Linux system, my bitkeeper trees, ssh keys for anywhere of
> interest.. and am in the wrong country, it'll be the 23rd/24th before I am
> back at my desks...
>
> I might get time to do a code review, my main worry is that all the
> problems reported with those patches in -mm made it into the patchset that
> went into Linus.. mainly things like forgetting to memset certain
> structures to 0 and sillies like that...

I saw one report where the recent drm security hole fix broke dri
for one user. Whilst it seems an isolated incident, could this have
more impact than we first realised ?

Worse case scenario we can drop out the multi-bridge support for now
if it needs work. Mike left SGI now, so we'll need to find someone else
with access to a Prism to make sure it still works correctly on a
real multi-gart system.

Dave

2005-03-15 16:18:40

by Dave Airlie

[permalink] [raw]
Subject: Re: drm lockups since 2.6.11-bk2


> >
> > I might get time to do a code review, my main worry is that all the
> > problems reported with those patches in -mm made it into the patchset that
> > went into Linus.. mainly things like forgetting to memset certain
> > structures to 0 and sillies like that...
>
> I saw one report where the recent drm security hole fix broke dri
> for one user. Whilst it seems an isolated incident, could this have
> more impact than we first realised ?

the radeon security changes? I've gotten no bad feedback on those neither
has dri-devel, so I've assumed they were all fine (usually radeon bug
reports get back fairly quickly as everyone has one ..),

the multi-bridge stuff is definitely broken as I've seen radeon and r128
reports on it .. and it looks most like 2.6.11-bk2 broke things and I
haven't merged anything until -bk7 ...

>
> Worse case scenario we can drop out the multi-bridge support for now
> if it needs work. Mike left SGI now, so we'll need to find someone else
> with access to a Prism to make sure it still works correctly on a
> real multi-gart system.

I'd like to make it work I'm sure it is some thing small wrong, but I've
no access for > 1 week to my radeon machine so unless someone else picks
it up we may need to drop it for now..

Dave.

--
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
Linux kernel - DRI, VAX / pam_smb / ILUG

2005-03-15 16:42:19

by Dave Jones

[permalink] [raw]
Subject: Re: drm lockups since 2.6.11-bk2

On Tue, Mar 15, 2005 at 04:15:42PM +0000, Dave Airlie wrote:

> > I saw one report where the recent drm security hole fix broke dri
> > for one user. Whilst it seems an isolated incident, could this have
> > more impact than we first realised ?
>
> the radeon security changes? I've gotten no bad feedback on those neither
> has dri-devel, so I've assumed they were all fine (usually radeon bug
> reports get back fairly quickly as everyone has one ..),

The missing memset in setversion ioctl.
What sounded odd was that this was reproduced on 2.6.11.x, rather
than 2.6.11-bk, which has none of the AGP changes.
Could be a red herring though, as it was only one report.

> > Worse case scenario we can drop out the multi-bridge support for now
> > if it needs work. Mike left SGI now, so we'll need to find someone else
> > with access to a Prism to make sure it still works correctly on a
> > real multi-gart system.
>
> I'd like to make it work I'm sure it is some thing small wrong, but I've
> no access for > 1 week to my radeon machine so unless someone else picks
> it up we may need to drop it for now..

I'll try and dig into it over the next few days, but I'm swamped
in other stuff right now :-/

Dave

2005-03-15 16:54:07

by Dave Jones

[permalink] [raw]
Subject: Re: drm lockups since 2.6.11-bk2

On Tue, Mar 15, 2005 at 04:15:42PM +0000, Dave Airlie wrote:
>
> > >
> > > I might get time to do a code review, my main worry is that all the
> > > problems reported with those patches in -mm made it into the patchset that
> > > went into Linus.. mainly things like forgetting to memset certain
> > > structures to 0 and sillies like that...
> >
> > I saw one report where the recent drm security hole fix broke dri
> > for one user. Whilst it seems an isolated incident, could this have
> > more impact than we first realised ?
>
> the radeon security changes? I've gotten no bad feedback on those neither
> has dri-devel, so I've assumed they were all fine (usually radeon bug
> reports get back fairly quickly as everyone has one ..),
>
> the multi-bridge stuff is definitely broken as I've seen radeon and r128
> reports on it .. and it looks most like 2.6.11-bk2 broke things and I
> haven't merged anything until -bk7 ...

Wait, -bk2 broke things ? The big agp changes went into -bk3,
so this seems odd.

Dave

2005-03-15 16:57:02

by Dave Airlie

[permalink] [raw]
Subject: Re: drm lockups since 2.6.11-bk2

> > the multi-bridge stuff is definitely broken as I've seen radeon and r128
> > reports on it .. and it looks most like 2.6.11-bk2 broke things and I
> > haven't merged anything until -bk7 ...
>
> Wait, -bk2 broke things ? The big agp changes went into -bk3,
> so this seems odd.

sorry bk2-bk3 broke things... bk2 was okay..

Dave.
--
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
Linux kernel - DRI, VAX / pam_smb / ILUG

2005-03-15 17:00:41

by Andrew Clayton

[permalink] [raw]
Subject: Re: drm lockups since 2.6.11-bk2

On Tue, 2005-03-15 at 11:53 -0500, Dave Jones wrote:
> On Tue, Mar 15, 2005 at 04:15:42PM +0000, Dave Airlie wrote:
> >
> > > >
> > > > I might get time to do a code review, my main worry is that all the
> > > > problems reported with those patches in -mm made it into the patchset that
> > > > went into Linus.. mainly things like forgetting to memset certain
> > > > structures to 0 and sillies like that...
> > >
> > > I saw one report where the recent drm security hole fix broke dri
> > > for one user. Whilst it seems an isolated incident, could this have
> > > more impact than we first realised ?
> >
> > the radeon security changes? I've gotten no bad feedback on those neither
> > has dri-devel, so I've assumed they were all fine (usually radeon bug
> > reports get back fairly quickly as everyone has one ..),
> >
> > the multi-bridge stuff is definitely broken as I've seen radeon and r128
> > reports on it .. and it looks most like 2.6.11-bk2 broke things and I
> > haven't merged anything until -bk7 ...
>
> Wait, -bk2 broke things ? The big agp changes went into -bk3,
> so this seems odd.
>

To clarify. 2.6.11-bk2 is working fine. It broke with 2.6.11-bk3, where
IIRC a drm update was made.

Disabling DRI in X and/or DRM in the kernel prevents X from locking the
machine.

> Dave
>

Cheers,

Andrew

2005-03-15 18:07:27

by Jesse Barnes

[permalink] [raw]
Subject: Re: drm lockups since 2.6.11-bk2

On Tuesday, March 15, 2005 6:36 am, Dave Jones wrote:
> I saw one report where the recent drm security hole fix broke dri
> for one user. Whilst it seems an isolated incident, could this have
> more impact than we first realised ?
>
> Worse case scenario we can drop out the multi-bridge support for now
> if it needs work. Mike left SGI now, so we'll need to find someone else
> with access to a Prism to make sure it still works correctly on a
> real multi-gart system.

I'd be happy to test and fix things, but the page table walker patches broke
ia64... Once that's cleared up I can go digging.

Jesse

2005-03-15 19:31:18

by Andrew Morton

[permalink] [raw]
Subject: Re: drm lockups since 2.6.11-bk2

Jesse Barnes <[email protected]> wrote:
>
> I'd be happy to test and fix things, but the page table walker patches broke
> ia64... Once that's cleared up I can go digging.

We're hoping that davem's fix (committed yesterday) fixed that.


ChangeSet 1.2181.1.2, 2005/03/14 21:16:17-08:00, [email protected]

[MM]: Restore pgd_index() iteration to clear_page_range().

Otherwise ia64 and sparc64 explode with the new ptwalk
iterators. The pgd level stuff does not handle virtual
address space holes (sparc64) and region based PGD indexing
(ia64) properly. It only matters in functions like
clear_page_range() which potentially walk over more than
a single VMA worth of address space.

Signed-off-by: David S. Miller <[email protected]>



memory.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)


diff -Nru a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c 2005-03-15 00:06:50 -08:00
+++ b/mm/memory.c 2005-03-15 00:06:50 -08:00
@@ -182,15 +182,19 @@
unsigned long addr, unsigned long end)
{
pgd_t *pgd;
- unsigned long next;
+ unsigned long i, next;

pgd = pgd_offset(tlb->mm, addr);
- do {
+ for (i = pgd_index(addr); i <= pgd_index(end-1); i++) {
next = pgd_addr_end(addr, end);
if (pgd_none_or_clear_bad(pgd))
continue;
clear_pud_range(tlb, pgd, addr, next);
- } while (pgd++, addr = next, addr != end);
+ pgd++;
+ addr = next;
+ if (addr == end)
+ break;
+ }
}

pte_t fastcall * pte_alloc_map(struct mm_struct *mm, pmd_t *pmd, unsigned long address)


2005-03-15 19:38:25

by Jesse Barnes

[permalink] [raw]
Subject: Re: drm lockups since 2.6.11-bk2

On Tuesday, March 15, 2005 11:25 am, Andrew Morton wrote:
> Jesse Barnes <[email protected]> wrote:
> > I'd be happy to test and fix things, but the page table walker patches
> > broke ia64... Once that's cleared up I can go digging.
>
> We're hoping that davem's fix (committed yesterday) fixed that.
>
>
> ChangeSet 1.2181.1.2, 2005/03/14 21:16:17-08:00, [email protected]
>
> [MM]: Restore pgd_index() iteration to clear_page_range().

Yep, seems to have worked (at least my system boots). I only saw it in BK
today (I was waiting for a post to Tony's thread with the fix so I didn't see
it as soon as I might have).

Now to test AGP stuff.

Jesse

2005-03-15 23:28:44

by Andrew Morton

[permalink] [raw]
Subject: Re: drm lockups since 2.6.11-bk2

Jesse Barnes <[email protected]> wrote:
>
> > We're hoping that davem's fix (committed yesterday) fixed that.
> >
> >
> > ChangeSet 1.2181.1.2, 2005/03/14 21:16:17-08:00, [email protected]
> >
> > [MM]: Restore pgd_index() iteration to clear_page_range().
>
> Yep, seems to have worked (at least my system boots).

It causes ppc64 to oops unpleasantly so we're not quite there yet.