Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753163AbeADNqx (ORCPT + 1 other); Thu, 4 Jan 2018 08:46:53 -0500 Received: from smtp.codeaurora.org ([198.145.29.96]:37210 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752998AbeADNqv (ORCPT ); Thu, 4 Jan 2018 08:46:51 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 6896C602B9 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=gkohli@codeaurora.org Subject: Re: [PATCH] tty: fix data race in n_tty_receive_buf_common To: Alan Cox Cc: jslaby@suse.com, gregkh@linuxfoundation.org, mikey@neuling.org, linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org References: <1514987332-14122-1-git-send-email-gkohli@codeaurora.org> <20180103193807.465e054e@alans-desktop> <0a456419-c836-08cf-070b-a254fb702b75@codeaurora.org> <20180104110920.169a1fe5@alans-desktop> From: "Kohli, Gaurav" Message-ID: <0dbd1f05-4c94-d1cc-3858-7bd4d38b9212@codeaurora.org> Date: Thu, 4 Jan 2018 19:16:46 +0530 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <20180104110920.169a1fe5@alans-desktop> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: > Which tty driver ? serial/msm_serial.c ? We are using our internal driver, msm_geni_serial.c > > Ok no what I need to see is a trace of what each CPU is doing at the > point you detect the problem. That way we can see what the path that > races is. Below is stack trace running by init in our case on one core -006|n_tty_open(     |    tty = 0xFFFFFFFF477AC880 -> (     |      disc_data = 0xFFFFFF80197AD000,     |      port = 0xFFFFFFFFEDE40000))     |  ldata = 0xFFFFFF80197AD000     |  trace_printk_fmt = 0xFFFFFF9F275125F8 -007|tty_ldisc_open.isra.3(     |    tty = 0xFFFFFFFF477AC880) -008|tty_ldisc_setup( -009|tty_init_dev(     |    driver = 0xFFFFFFFFEDE2A480,     |    idx = 0) -010|tty_open_by_driver(inline) -010|tty_open( Core 2: -000|n_tty_receive_buf_common(     |    tty = 0xFFFFFFFF477AC880,     |  ?)     |  ldata_=_0x0     |  __func__ = (110, 95, 116, 116, 121, 95, 114, 101, 99, 101, 105, 118, 101, 95, 98, 117, 102, 95, 99, 111, 109, 109, 111, 110, 0)     |  __u = (__val = 7079195495121566464, __c = (0))     |  c = 127     |  ldata = 0xFFFFFFFFF40DF97C     |  c = 0     |  ldata = 0xFFFFFF9F26F46000 -001|n_tty_receive_buf2(     |    tty = 0xFFFFFFFF477AC880, -002|tty_ldisc_receive_buf(inline) -002|receive_buf(inline) -002|flush_to_ldisc( Please let me know in case some other trace required >> We have seen this issue on 4.9 and also one thing i have observed, >> before tty is getting reinit in tty_init_dev(), > When yo stop the DMA is it instantaneous or does it cause a final > interrupt after you return from stop_rx ? > > To me it still looks like data is being queued after the port is told to > stop but that's not a certainty. This geni is based on FIFO. I have also put ftraces and from that we can see open is not able to finish but there is request of flushing:          kworker/-15514   2.... 35206.979226: workqueue_execute_start: work struct 0xffffffffede40008: function flush_to_ldisc          kworker/-15514   2.... 35206.979237: bprint: n_tty_receive_buf_common: n_tty_receive_buf_common tty=0xffffffff477ac880 ldata=(nil)                    init-1       4.... 35206.979751: bprint:               n_tty_open: n_tty_open tty=0xffffffff477ac880 ldata=0xffffff80197ad000 >> there is console service exited before it in all the dumps. >>  35206.969644:   <2> init: Service 'console' (pid 7440) exited with >> status 130 >>  35206.969690:   <2> init: Sending signal 9 to service 'console' (pid >> 7440) process group... >>  35206.970857:   <2> init: kill(7440, 9) failed: No such process. >> >> So how can we stop request of receive buff, if we already have tty_port >> and tty is getting reinitialized in midway like above >> case? > Is the port your console device. If you use a different port as a console > device does the problem go away - that could be a very important detail > as the hangup behaviour for the two is quite different. Yes , we are using ttymsm0 as console device, this is the only port we are using. So it seems here, we are getting flush request when init reinitialize the tty for same port. Please let me know if some other debug logs are required. Regards Gaurav -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.