Date: Thu, 4 Jan 2018 11:09:20 +0000
From: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
To: "Kohli, Gaurav" <gkohli@codeaurora.org>
Cc: jslaby@suse.com, gregkh@linuxfoundation.org, mikey@neuling.org,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH] tty: fix data race in n_tty_receive_buf_common
Message-ID: <20180104110920.169a1fe5@alans-desktop>
In-Reply-To: <0a456419-c836-08cf-070b-a254fb702b75@codeaurora.org>
References: <1514987332-14122-1-git-send-email-gkohli@codeaurora.org>
        <20180103193807.465e054e@alans-desktop>
        <0a456419-c836-08cf-070b-a254fb702b75@codeaurora.org>
Organization: Intel Corporation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8BIT
Sender: linux-kernel-owner@vger.kernel.org

> > What does a full (all CPU) trace of the bug look like and what tty driver
> > are you using when you capture the trace ?  

Which tty driver ? serial/msm_serial.c ?

> We are using tty for console logging,
>      |    tty = 0xFFFFFFFF477AC880 -> (
>      |      magic = 21505,
>      |      kref = (refcount = (counter = 2)),
>      |      dev = 0xFFFFFFFFEDE3DA80,
>      |      driver = 0xFFFFFFFFEDE2A480,
>      |      ops = 0xFFFFFF9F26F7D0D0,
>      |      index = 0,
>      |      ldisc_sem = (count = 1, wait_lock = (raw_lock = (owner = 0, 
> next = 0), magic = 3735899821, own
>      |      termiox = 0x0,
>      |      name = "ttyMSM0",
>      |      pgrp = 0x0,

Ok no what I need to see is a trace of what each CPU is doing at the
point you detect the problem. That way we can see what the path that
races is.

> We have seen this issue on 4.9 and also one thing i have observed, 
> before tty is getting reinit in tty_init_dev(),

When yo stop the DMA is it instantaneous or does it cause a final
interrupt after you return from stop_rx ?

To me it still looks like data is being queued after the port is told to
stop but that's not a certainty.

> there is console service exited before it in all the dumps.
>   35206.969644:   <2> init: Service 'console' (pid 7440) exited with 
> status 130
>   35206.969690:   <2> init: Sending signal 9 to service 'console' (pid 
> 7440) process group...
>   35206.970857:   <2> init: kill(7440, 9) failed: No such process.
> 
> So how can we stop request of receive buff, if we already have tty_port 
> and tty is getting reinitialized in midway like above
> case?

Is the port your console device. If you use a different port as a console
device does the problem go away - that could be a very important detail
as the hangup behaviour for the two is quite different.

Alan