2008-06-16 22:05:53

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: "fb-defio: fix page list with concurrent processes"

Your patch "fb-defio: fix page list with concurrent processes"
definitely seems to help with the suspend/resume problem I had with the
Xen pvfb device. Is it queued up anywhere? It seems to be a real
bugfix, and should probably be queued for 2.6.26...

J

fb-defio: fix page list with concurrent processes

From: Jaya Kumar <[email protected]>

Hi Tony, Geert, Andrew, fbdev,

This patch is a bugfix for how defio handles multiple processes manipulating
the same framebuffer. Thanks to Bernard Blackham for identifying this bug.
It occurs when two applications mmap the same framebuffer and concurrently
write to the same page. Normally, this doesn't occur since only a single
process mmaps the framebuffer. The symptom of the bug is that the mapping
applications will hang. The cause is that defio incorrectly tries to add the
same page twice to the pagelist. The solution I have is to walk the pagelist
and check for a duplicate before adding. Since I needed to walk the
pagelist, I now also keep the pagelist in sorted order.

Thanks,
jaya

Signed-off-by: Jaya Kumar <[email protected]>

---
drivers/video/fb_defio.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)

===================================================================
--- a/drivers/video/fb_defio.c
+++ b/drivers/video/fb_defio.c
@@ -74,6 +74,7 @@ static int fb_deferred_io_mkwrite(struct vm_area_struct *vma,
{
struct fb_info *info = vma->vm_private_data;
struct fb_deferred_io *fbdefio = info->fbdefio;
+ struct page *cur;

/* this is a callback we get when userspace first tries to
write to the page. we schedule a workqueue. that workqueue
@@ -83,7 +84,24 @@ static int fb_deferred_io_mkwrite(struct vm_area_struct *vma,

/* protect against the workqueue changing the page list */
mutex_lock(&fbdefio->lock);
- list_add(&page->lru, &fbdefio->pagelist);
+
+ /* we loop through the pagelist before adding in order
+ to keep the pagelist sorted */
+ list_for_each_entry(cur, &fbdefio->pagelist, lru) {
+ /* this check is to catch the case where a new
+ process could start writing to the same page
+ through a new pte. this new access can cause the
+ mkwrite even when the original ps's pte is marked
+ writable */
+ if (unlikely(cur == page))
+ goto page_already_added;
+ else if (cur->index > page->index)
+ break;
+ }
+
+ list_add_tail(&page->lru, &cur->lru);
+
+page_already_added:
mutex_unlock(&fbdefio->lock);

/* come back after delay to process the deferred IO */


2008-06-17 01:11:25

by Jaya Kumar

[permalink] [raw]
Subject: Re: "fb-defio: fix page list with concurrent processes"

On Mon, Jun 16, 2008 at 3:05 PM, Jeremy Fitzhardinge <[email protected]> wrote:
> Your patch "fb-defio: fix page list with concurrent processes" definitely
> seems to help with the suspend/resume problem I had with the Xen pvfb
> device. Is it queued up anywhere? It seems to be a real bugfix, and should
> probably be queued for 2.6.26...

It isn't currently queued. I had intended to improve its performance
by taking advantage of Andrew's suggestion of using !list_empty on the
page->lru to avoid walking the page list to find the duplicate page,
but I ran into trouble since the page starts off being on the lru
list. I'll try to take a look at doing this next weekend.

Thanks,
jaya


>
> J
>
> fb-defio: fix page list with concurrent processes
>
> From: Jaya Kumar <[email protected]>
>
> Hi Tony, Geert, Andrew, fbdev,
>
> This patch is a bugfix for how defio handles multiple processes manipulating
> the same framebuffer. Thanks to Bernard Blackham for identifying this bug.
> It occurs when two applications mmap the same framebuffer and concurrently
> write to the same page. Normally, this doesn't occur since only a single
> process mmaps the framebuffer. The symptom of the bug is that the mapping
> applications will hang. The cause is that defio incorrectly tries to add the
> same page twice to the pagelist. The solution I have is to walk the pagelist
> and check for a duplicate before adding. Since I needed to walk the
> pagelist, I now also keep the pagelist in sorted order.
>
> Thanks,
> jaya
>
> Signed-off-by: Jaya Kumar <[email protected]>
>
> ---
> drivers/video/fb_defio.c | 20 +++++++++++++++++++-
> 1 file changed, 19 insertions(+), 1 deletion(-)
>
> ===================================================================
> --- a/drivers/video/fb_defio.c
> +++ b/drivers/video/fb_defio.c
> @@ -74,6 +74,7 @@ static int fb_deferred_io_mkwrite(struct vm_area_struct
> *vma,
> {
> struct fb_info *info = vma->vm_private_data;
> struct fb_deferred_io *fbdefio = info->fbdefio;
> + struct page *cur;
>
> /* this is a callback we get when userspace first tries to
> write to the page. we schedule a workqueue. that workqueue
> @@ -83,7 +84,24 @@ static int fb_deferred_io_mkwrite(struct vm_area_struct
> *vma,
>
> /* protect against the workqueue changing the page list */
> mutex_lock(&fbdefio->lock);
> - list_add(&page->lru, &fbdefio->pagelist);
> +
> + /* we loop through the pagelist before adding in order
> + to keep the pagelist sorted */
> + list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> + /* this check is to catch the case where a new
> + process could start writing to the same page
> + through a new pte. this new access can cause the
> + mkwrite even when the original ps's pte is marked
> + writable */
> + if (unlikely(cur == page))
> + goto page_already_added;
> + else if (cur->index > page->index)
> + break;
> + }
> +
> + list_add_tail(&page->lru, &cur->lru);
> +
> +page_already_added:
> mutex_unlock(&fbdefio->lock);
>
> /* come back after delay to process the deferred IO */
>
>
>

2008-06-17 07:35:00

by Markus Armbruster

[permalink] [raw]
Subject: Re: "fb-defio: fix page list with concurrent processes"

"Jaya Kumar" <[email protected]> writes:

> On Mon, Jun 16, 2008 at 3:05 PM, Jeremy Fitzhardinge <[email protected]> wrote:
>> Your patch "fb-defio: fix page list with concurrent processes" definitely
>> seems to help with the suspend/resume problem I had with the Xen pvfb
>> device. Is it queued up anywhere? It seems to be a real bugfix, and should
>> probably be queued for 2.6.26...
>
> It isn't currently queued. I had intended to improve its performance
> by taking advantage of Andrew's suggestion of using !list_empty on the
> page->lru to avoid walking the page list to find the duplicate page,
> but I ran into trouble since the page starts off being on the lru
> list. I'll try to take a look at doing this next weekend.
>
> Thanks,
> jaya

Well, we got a bug that makes the code useless in practice for us, and
a fix for it that's not quite as fast as it could be. Which is
better, somewhat slow code, or somewhat useless code? I'd like to see
the fix merged as soon as possible. You can always improve its
performance later.

2008-06-17 08:21:25

by Jaya Kumar

[permalink] [raw]
Subject: Re: "fb-defio: fix page list with concurrent processes"

On Tue, Jun 17, 2008 at 12:34 AM, Markus Armbruster <[email protected]> wrote:
> "Jaya Kumar" <[email protected]> writes:
>
>> On Mon, Jun 16, 2008 at 3:05 PM, Jeremy Fitzhardinge <[email protected]> wrote:
>>> Your patch "fb-defio: fix page list with concurrent processes" definitely
>>> seems to help with the suspend/resume problem I had with the Xen pvfb
>>> device. Is it queued up anywhere? It seems to be a real bugfix, and should
>>> probably be queued for 2.6.26...
>>
>> It isn't currently queued. I had intended to improve its performance
>> by taking advantage of Andrew's suggestion of using !list_empty on the
>> page->lru to avoid walking the page list to find the duplicate page,
>> but I ran into trouble since the page starts off being on the lru
>> list. I'll try to take a look at doing this next weekend.
>>
>> Thanks,
>> jaya
>
> Well, we got a bug that makes the code useless in practice for us, and
> a fix for it that's not quite as fast as it could be. Which is
> better, somewhat slow code, or somewhat useless code? I'd like to see
> the fix merged as soon as possible. You can always improve its
> performance later.
>

Ok, I didn't realize there was any time pressure. Keep in mind, I'm
just a person doing this stuff for fun on weekends not someone under
commercial pressures. Yup, I've got no problem if the old patch is
requeued and merged.

Thanks,
jaya

2008-06-17 09:31:52

by Markus Armbruster

[permalink] [raw]
Subject: Re: "fb-defio: fix page list with concurrent processes"

"Jaya Kumar" <[email protected]> writes:

> On Tue, Jun 17, 2008 at 12:34 AM, Markus Armbruster <[email protected]> wrote:
>> "Jaya Kumar" <[email protected]> writes:
>>
>>> On Mon, Jun 16, 2008 at 3:05 PM, Jeremy Fitzhardinge <[email protected]> wrote:
>>>> Your patch "fb-defio: fix page list with concurrent processes" definitely
>>>> seems to help with the suspend/resume problem I had with the Xen pvfb
>>>> device. Is it queued up anywhere? It seems to be a real bugfix, and should
>>>> probably be queued for 2.6.26...
>>>
>>> It isn't currently queued. I had intended to improve its performance
>>> by taking advantage of Andrew's suggestion of using !list_empty on the
>>> page->lru to avoid walking the page list to find the duplicate page,
>>> but I ran into trouble since the page starts off being on the lru
>>> list. I'll try to take a look at doing this next weekend.
>>>
>>> Thanks,
>>> jaya
>>
>> Well, we got a bug that makes the code useless in practice for us, and
>> a fix for it that's not quite as fast as it could be. Which is
>> better, somewhat slow code, or somewhat useless code? I'd like to see
>> the fix merged as soon as possible. You can always improve its
>> performance later.
>>
>
> Ok, I didn't realize there was any time pressure. Keep in mind, I'm
> just a person doing this stuff for fun on weekends not someone under
> commercial pressures. Yup, I've got no problem if the old patch is
> requeued and merged.
>
> Thanks,
> jaya

Hey, it's your own fault! If you wrote useless code in your spare
time, we wouldn't bother you ;->

Seriously, I appreciate your contributions, and I didn't mean to
pressure you. Just to explain why I think it makes sense to merge
your fix now, and performance improvements later.

2008-07-03 21:11:17

by Markus Armbruster

[permalink] [raw]
Subject: Re: "fb-defio: fix page list with concurrent processes"

"Jaya Kumar" <[email protected]> writes:

> On Mon, Jun 16, 2008 at 3:05 PM, Jeremy Fitzhardinge <[email protected]> wrote:
>> Your patch "fb-defio: fix page list with concurrent processes" definitely
>> seems to help with the suspend/resume problem I had with the Xen pvfb
>> device. Is it queued up anywhere? It seems to be a real bugfix, and should
>> probably be queued for 2.6.26...
>
> It isn't currently queued. I had intended to improve its performance
> by taking advantage of Andrew's suggestion of using !list_empty on the
> page->lru to avoid walking the page list to find the duplicate page,
> but I ran into trouble since the page starts off being on the lru
> list. I'll try to take a look at doing this next weekend.
>
> Thanks,
> jaya

Jaya, any news?

2008-07-03 23:44:50

by Jaya Kumar

[permalink] [raw]
Subject: Re: "fb-defio: fix page list with concurrent processes"

On Fri, Jul 4, 2008 at 5:10 AM, Markus Armbruster <[email protected]> wrote:
>
> Jaya, any news?
>

Hi Markus,

In the thread, as per your point:

> better, somewhat slow code, or somewhat useless code? I'd like to see
> the fix merged as soon as possible. You can always improve its
> performance later.

I responded:

> Yup, I've got no problem if the old patch is requeued and merged.

I'd assumed that you would requeue that patch.

Thanks,
jaya

2008-07-07 20:43:25

by Markus Armbruster

[permalink] [raw]
Subject: Re: "fb-defio: fix page list with concurrent processes"

"Jaya Kumar" <[email protected]> writes:

> On Fri, Jul 4, 2008 at 5:10 AM, Markus Armbruster <[email protected]> wrote:
>>
>> Jaya, any news?
>>
>
> Hi Markus,
>
> In the thread, as per your point:
>
>> better, somewhat slow code, or somewhat useless code? I'd like to see
>> the fix merged as soon as possible. You can always improve its
>> performance later.
>
> I responded:
>
>> Yup, I've got no problem if the old patch is requeued and merged.
>
> I'd assumed that you would requeue that patch.
>
> Thanks,
> jaya

Unfortunately, I missed that assumption of yours %-}

Still want me to post it for you?

2008-07-08 00:54:50

by Jaya Kumar

[permalink] [raw]
Subject: Re: "fb-defio: fix page list with concurrent processes"

On Mon, Jul 7, 2008 at 4:43 PM, Markus Armbruster <[email protected]> wrote:
> Unfortunately, I missed that assumption of yours %-}
>
> Still want me to post it for you?
>

No worries, will post shortly. Sorry for the confusion.

Thanks,
jaya