Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754900Ab0L3Qmd (ORCPT ); Thu, 30 Dec 2010 11:42:33 -0500 Received: from rcsinet10.oracle.com ([148.87.113.121]:16851 "EHLO rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754601Ab0L3Qmc (ORCPT >); Thu, 30 Dec 2010 11:42:32 -0500 Date: Thu, 30 Dec 2010 11:40:51 -0500 From: Konrad Rzeszutek Wilk To: Joe Jin Cc: jeremy@goop.org, ian.campbell@citrix.com, Andrew Morton , linux-fbdev@vger.kernel.org, xen-devel@lists.xensource.com, linux-kernel@vger.kernel.org, gurudas.pai@oracle.com, greg.marsden@oracle.com, guru.anbalagane@oracle.com Subject: Re: [patch] xenfb: fix xenfb suspend/resume race Message-ID: <20101230164051.GC24313@dumpdata.com> References: <20101230125616.GA31537@joejin-pc.cn.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20101230125616.GA31537@joejin-pc.cn.oracle.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9096 Lines: 217 On Thu, Dec 30, 2010 at 08:56:16PM +0800, Joe Jin wrote: > Hi, Joe, Patch looks good, however.. I am unclear from your description whether the patch fixes the problem (I would presume so). Or does it take a long time to hit this race? > > when do migration test, we hit the panic as below: > <1>BUG: unable to handle kernel paging request at 0000000b819fdb98 > <1>IP: [] notify_remote_via_irq+0x13/0x34 > <4>PGD 94b10067 PUD 0 > <0>Oops: 0000 [#1] SMP > <0>last sysfs file: /sys/class/misc/autofs/dev > <4>CPU 3 > <4>Modules linked in: autofs4(U) hidp(U) nfs(U) fscache(U) nfs_acl(U) > auth_rpcgss(U) rfcomm(U) l2cap(U) bluetooth(U) rfkill(U) lockd(U) sunrpc(U) > nf_conntrack_netbios_ns(U) ipt_REJECT(U) nf_conntrack_ipv4(U) > nf_defrag_ipv4(U) xt_state(U) nf_conntrack(U) iptable_filter(U) ip_tables(U) > ip6t_REJECT(U) xt_tcpudp(U) ip6table_filter(U) ip6_tables(U) x_tables(U) > ipv6(U) parport_pc(U) lp(U) parport(U) snd_seq_dummy(U) snd_seq_oss(U) > snd_seq_midi_event(U) snd_seq(U) snd_seq_device(U) snd_pcm_oss(U) > snd_mixer_oss(U) snd_pcm(U) snd_timer(U) snd(U) soundcore(U) > snd_page_alloc(U) joydev(U) xen_netfront(U) pcspkr(U) xen_blkfront(U) > uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) > Pid: 18, comm: events/3 Not tainted 2.6.32 > RIP: e030:[] [] > ify_remote_via_irq+0x13/0x34 > RSP: e02b:ffff8800e7bf7bd0 EFLAGS: 00010202 > RAX: ffff8800e61c8000 RBX: ffff8800e62f82c0 RCX: 0000000000000000 > RDX: 00000000000001e3 RSI: ffff8800e7bf7c68 RDI: 0000000bfffffff4 > RBP: ffff8800e7bf7be0 R08: 00000000000001e2 R09: ffff8800e62f82c0 > R10: 0000000000000001 R11: ffff8800e6386110 R12: 0000000000000000 > R13: 0000000000000007 R14: ffff8800e62f82e0 R15: 0000000000000240 > FS: 00007f409d3906e0(0000) GS:ffff8800028b8000(0000) > GS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000b819fdb98 CR3: 000000003ee3b000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process events/3 (pid: 18, threadinfo ffff8800e7bf6000, task > f8800e7bf4540) > Stack: > 0000000000000200 ffff8800e61c8000 ffff8800e7bf7c00 ffffffff812712c9 > <0> ffffffff8100ea5f ffffffff81438d80 ffff8800e7bf7cd0 ffffffff812714ee > <0> 0000000000000000 ffffffff81270568 000000000000e030 0000000000010202 > Call Trace: > [] xenfb_send_event+0x5c/0x5e > [] ? xen_restore_fl_direct_end+0x0/0x1 > [] ? _spin_unlock_irqrestore+0x16/0x18 > [] xenfb_refresh+0x1b1/0x1d7 > [] ? sys_imageblit+0x1ac/0x458 > [] xenfb_imageblit+0x2f/0x34 > [] soft_cursor+0x1b5/0x1c8 > [] bit_cursor+0x4b6/0x4d7 > [] ? xen_restore_fl_direct_end+0x0/0x1 > [] ? _spin_unlock_irqrestore+0x16/0x18 > [] ? bit_cursor+0x0/0x4d7 > [] fb_flashcursor+0xff/0x111 > [] ? fb_flashcursor+0x0/0x111 > [] worker_thread+0x14d/0x1ed > [] ? autoremove_wake_function+0x0/0x3d > [] ? _spin_unlock_irqrestore+0x16/0x18 > [] ? worker_thread+0x0/0x1ed > [] kthread+0x6e/0x76 > [] child_rip+0xa/0x20 > [] ? int_ret_from_sys_call+0x7/0x1b > [] ? retint_restore_args+0x5/0x6 > [] ? child_rip+0x0/0x20 > Code: 6b ff 0c 8b 87 a4 db 9f 81 66 85 c0 74 08 0f b7 f8 e8 3b ff ff ff c9 > c3 55 48 89 e5 48 83 ec 10 0f 1f 44 00 00 89 ff 48 6b ff 0c <8b> 87 a4 db 9f > 81 66 85 c0 74 14 48 8d 75 f0 0f b7 c0 bf 04 00 > RIP [] notify_remote_via_irq+0x13/0x34 > RSP > CR2: 0000000b819fdb98 > ---[ end trace 098b4b74827595d0 ]--- > Kernel panic - not syncing: Fatal exception > Pid: 18, comm: events/3 Tainted: G D 2.6.32 > Call Trace: > [] ? card_probe+0x99/0x123 > [] panic+0xa5/0x162 > [] ? xen_restore_fl_direct_end+0x0/0x1 > [] ? _spin_unlock_irqrestore+0x16/0x18 > [] ? down_trylock+0x30/0x38 > [] ? card_probe+0x99/0x123 > [] ? console_unblank+0x23/0x6f > [] ? print_oops_end_marker+0x23/0x25 > [] ? card_probe+0x99/0x123 > [] oops_end+0xb7/0xc7 > [] no_context+0x1f1/0x200 > [] ? card_probe+0x99/0x123 > [] __bad_area_nosemaphore+0x183/0x1a6 > [] ? extract_buf+0xbd/0x134 > [] ? pvclock_clocksource_read+0x47/0x9e > [] bad_area_nosemaphore+0x13/0x15 > [] do_page_fault+0x147/0x26c > [] page_fault+0x25/0x30 > [] ? notify_remote_via_irq+0x13/0x34 > [] xenfb_send_event+0x5c/0x5e > [] ? xen_restore_fl_direct_end+0x0/0x1 > [] ? _spin_unlock_irqrestore+0x16/0x18 > [] xenfb_refresh+0x1b1/0x1d7 > [] ? sys_imageblit+0x1ac/0x458 > [] xenfb_imageblit+0x2f/0x34 > [] soft_cursor+0x1b5/0x1c8 > [] bit_cursor+0x4b6/0x4d7 > [] ? xen_restore_fl_direct_end+0x0/0x1 > [] ? _spin_unlock_irqrestore+0x16/0x18 > [] ? bit_cursor+0x0/0x4d7 > [] fb_flashcursor+0xff/0x111 > [] ? fb_flashcursor+0x0/0x111 > [] worker_thread+0x14d/0x1ed > [] ? autoremove_wake_function+0x0/0x3d > [] ? _spin_unlock_irqrestore+0x16/0x18 > [] ? worker_thread+0x0/0x1ed > [] kthread+0x6e/0x76 > [] child_rip+0xa/0x20 > [] ? int_ret_from_sys_call+0x7/0x1b > [] ? retint_restore_args+0x5/0x6 > [] ? child_rip+0x0/0x20 > [] ? child_rip+0x0/0x20 > > Check the source found this maybe caused by kernel tried to used not ready > xenfb when resume. > > Below is the potential fix, please reivew it > > Signed-off-by: Joe Jin > Cc: Ian Campbell > Cc: Jeremy Fitzhardinge > Cc: Andrew Morton > > --- > xen-fbfront.c | 19 +++++++++++-------- > 1 file changed, 11 insertions(+), 8 deletions(-) > > diff --git a/drivers/video/xen-fbfront.c b/drivers/video/xen-fbfront.c > index dc72563..367fb1c 100644 > --- a/drivers/video/xen-fbfront.c > +++ b/drivers/video/xen-fbfront.c > @@ -561,26 +561,24 @@ static void xenfb_init_shared_page(struct xenfb_info *info, > static int xenfb_connect_backend(struct xenbus_device *dev, > struct xenfb_info *info) > { > - int ret, evtchn; > + int ret, evtchn, irq; > struct xenbus_transaction xbt; > > ret = xenbus_alloc_evtchn(dev, &evtchn); > if (ret) > return ret; > - ret = bind_evtchn_to_irqhandler(evtchn, xenfb_event_handler, > + irq = bind_evtchn_to_irqhandler(evtchn, xenfb_event_handler, > 0, dev->devicetype, info); > - if (ret < 0) { > + if (irq < 0) { > xenbus_free_evtchn(dev, evtchn); > xenbus_dev_fatal(dev, ret, "bind_evtchn_to_irqhandler"); > - return ret; > + return irq; > } > - info->irq = ret; > - > again: > ret = xenbus_transaction_start(&xbt); > if (ret) { > xenbus_dev_fatal(dev, ret, "starting transaction"); > - return ret; > + goto unbind_irq; > } > ret = xenbus_printf(xbt, dev->nodename, "page-ref", "%lu", > virt_to_mfn(info->page)); > @@ -602,15 +600,20 @@ static int xenfb_connect_backend(struct xenbus_device *dev, > if (ret == -EAGAIN) > goto again; > xenbus_dev_fatal(dev, ret, "completing transaction"); > - return ret; > + goto unbind_irq; > } > > xenbus_switch_state(dev, XenbusStateInitialised); > + info->irq = irq; > return 0; > > error_xenbus: > xenbus_transaction_end(xbt, 1); > xenbus_dev_fatal(dev, ret, "writing xenstore"); > + unbind_irq: > + printk(KERN_ERR "xenfb_connect_backend failed!\n"); > + unbind_from_irqhandler(irq, info); > + xenbus_free_evtchn(dev, evtchn); > return ret; > } > > > > -- > Oracle > Joe Jin | Team Leader, Software Development | +8610.8278.6295 > ORACLE | Linux and Virtualization > Incubator Building 2-A ZPark | Beijing China, 100094 > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/