RM500Q-AE : wwan0 (qmi_wwan): transmit queue 0 timed out

I have an RM500Q-AE (Rev: RM500QAEAAR11A02M4G) in a ZBT WG1608
It works fine for around 3-4 days then I get a watchdog timeout where the transmit queue times out, and the modem stops transmitting until I reboot the router.

[130433.958370] xhci-mtk 1e1c0000.xhci: ERROR unknown event type 37
[132114.142348] ------------[ cut here ]------------
[132114.147089] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x1ac/0x324
[132114.155438] NETDEV WATCHDOG: wwan0 (qmi_wwan): transmit queue 0 timed out
[132114.162279] Modules linked in: rt2800usb rt2800lib qcserial pppoe ppp_async option cdc_mbim
… cut the rest out …
[132114.324940] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.167 #0
[132114.373178] …
[132114.375701] Call Trace:
[132114.378247] [<800106a0>] show_stack+0x58/0x100
[132114.382777] [<80440c44>] dump_stack+0xa4/0xe0
[132114.387206] [<8002e958>] __warn+0xe0/0x114
[132114.391368] [<8002e9bc>] warn_slowpath_fmt+0x30/0x3c
[132114.396399] [<80361fd0>] dev_watchdog+0x1ac/0x324
[132114.401187] [<800871c4>] call_timer_fn.isra.3+0x24/0x84
[132114.406475] [<800873e0>] run_timer_softirq+0x1bc/0x248
[132114.411698] [<8045e230>] __do_softirq+0x128/0x2ec
[132114.416474] [<800330c4>] irq_exit+0xac/0xc8
[132114.420737] [<802473ec>] plat_irq_dispatch+0xfc/0x138
[132114.425853] [<8000b5e8>] except_vec_vi_end+0xb8/0xc4
[132114.430880] [<8000cfb0>] r4k_wait_irqoff+0x1c/0x24
[132114.435869] —[ end trace 9822945857e5d943 ]—

This is on a WG1608 under ROOter + Goldenorb. I suspect its a modem firmware issue, but don’t know any debug commands that would be useful in querying the modem when its such a state.

Normal modem status is:
Temperature of modem: 32C
LBAND=“B66 (Bandwidth 20 MHz Down | 20 MHz Up) n71, B2 (CA, Bandwidth 15 MHz)”
CSQ_RSSI="-55 dBm"
ECIO="-8 dB -15"
RSCP="-82 dBm (RxD -92 dBm) -44"
MODE=“LTE FDD/NR EN-DC”

Sometimes I would see: Unknown nxx frequency (instead of n71) under modem status (along with B66)
but now its just been showing B66 and no other frequencies when the modem transmit queue is hung.
For a while I thought it was power (started with 2.5a for WG1608 router its in), but now running the router off a 4a supply and still see the TX queue hangs.

Are there any debug AT commands that would be useful when the TX queue is hung?

Dear Thomas,
Thanks for your question. Is it that you buy a router named ZBT WG1608 with RM500Q-AE inside and have some problems now?
If yes, it will be better if you can contact the router company instead of us, because we only produce RM500Q-AE 5G moudle ,not router. The Router company is our customer. if the problem is related with RM500Q-AE, the router company will contact us. thanks for your understanding!

I see. I will test it in a USB sled with some other system driving it. I strongly suspect the router vendor won’t care.
There is nothing wrong with the router, it is the RM500Q that has issues.

I can use a 4x4 MIMO antenna with the 4 antenna cables wired one way and get the TX hangs every 5 minutes. When I change the antenna cable order to the 4x4 MIMO from that previous way (correct per 4x4 MIMO antenna manual) to the incorrect way it only see’s the TX hangs every 3-4 days.

I bought the RM500Q-AE separately from the router.

RM500Q-AE had a TX hang after 2 days up , had to reboot the router to get it back

AT+CFUN=1,1 //Reset the module. No luck with that

The main difference in AT cmd responses I could find were:
=== during TX hang
AT+QENG=“servingcell”
+QENG: “servingcell”,“NOCONN”
+QENG: “LTE”,“FDD”,310,260,603302,465,66786,66,5,5,ADFA,-79,-8,-51,17,255,-32768,44
+QENG:“NR5G-NSA”,310,260,65535,-32768,-32768,-32768
where:
“LTE”,<is_tdd>,,,,,,<freq_band_ind>,<UL_bandwidth>,<DL_bandwidth>,,,,,,,<tx_power>,
“NR5G-NSA”,,,,,,,,

versus after reboot

=== router reboot
AT+QENG=“servingcell”
+QENG: “servingcell”,“NOCONN”
+QENG: “LTE”,“FDD”,310,260,603302,465,66786,66,5,5,ADFA,-84,-8,-50,16,12,20,-
+QENG:“NR5G-NSA”,310,260,859,-84,0,-15,126490,71

Now as a programmer I’m suspicious when I see -32768 values (or a 255 or a 65535)

Hello Thomas,
-32768 means invalid parameter, and it works on LTE mode, not nr mode. please inquire your module firmware version by sending ATI command. Thanks.

You mean the firmware revision that I posted on the first line of my starting post?
(Rev: RM500QAEAAR11A02M4G)

AT+GMR
RM500QAEAAR11A02M4G

and

ATI
Quectel
RM500Q-AE
Revision: RM500QAEAAR11A02M4G
OK

Note: I see a -32768 under the “LTE”,“FDD” results also
besides under the “NR5G-NSA” modes

It is a modem firmware issue.

The “transmit queue 0 timed out” warning from the Linux kernel indicates that the network interface in the USB device stopped responding; e.g. the kernel asked to read data from the USB device, the USB device is silent.

When that issue happens and the kernel warns about the network interface being non-responsive, it is also very very very very likely that the AT port is stuck (no AT commands go through) and that the QMI interface also is stuck (no QMI commands go through).

The only way to debug this kind of issues is involving the manufacturer, and I truly suggest @Peter.Zhu-Q to follow up on this issue with the reporter.

@Markham_thomas if this is easily reproducible in your location, you should try to gather QLog traces while reproducing the problem; although if it takes 3-4 days to reproduce, the amount of traces generated may be too much. I’m sure Quectel support knows how to deal with this, maybe you can enable some minimal traces or something like that, or maybe even collect crash dumps from the module or something.

Although after re-reading the thread, it looks like you can send AT commands while the network interface is stuck? That would be the first time I’ve seen that. If so, you can at least go on and reboot it cleanly instead of doing a hard power reset :wink:

I’m not sure what QLog traces are…
Since Peter didn’t give me any AT cmds to debug this with I picked some and found some where the return values were: -32768 means invalid parameter

its repeatable every 2 days or so. Even way more often If I correctly wire the antenna’s to my 4x4 MIMO antenna.
In my quest to reduce SINR I moved the antenna’s up for better line of sight to the towers (2km away) and thought the frequency of hangs increased slightly with the better signal.

Note: the TX hangs happen even when I have completely changed the antenna’s out for different 2x2 MIMO (x2) antenna’s. I’ve now tried 8 different antenna’s with no change in TX hang frequency.

There are times that I can send AT commands (not always).
I had the modem in QMI mode when I gathered those AT cmds you saw in the earlier post.

One of the developers on the ROOter forum told me to move it into ECM mode as that had better recovery.
and here’s the 2 days later TX hang from the modem in ECM mode (the USB modem just disappears)

[66399.036973] xhci-mtk 1e1c0000.xhci: xHCI host not responding to stop endpoint command.
[66399.044888] xhci-mtk 1e1c0000.xhci: xHCI host controller not responding, assume dead
[66399.054474] xhci-mtk 1e1c0000.xhci: HC died; cleaning up
[66399.059953] usb 1-1: USB disconnect, device number 2
[66399.070516] usb 2-1: USB disconnect, device number 2
[66399.075588] usb 2-1.4: USB disconnect, device number 3
[66399.081511] option1 ttyUSB0: GSM modem (1-port) converter now disconnected from ttyUSB0
[66399.089905] option 2-1.4:1.0: device disconnected
[66399.095751] option1 ttyUSB1: GSM modem (1-port) converter now disconnected from ttyUSB1
[66399.104216] option 2-1.4:1.1: device disconnected
[66399.111156] option1 ttyUSB2: GSM modem (1-port) converter now disconnected from ttyUSB2
[66399.119762] option 2-1.4:1.2: device disconnected
[66399.125790] option1 ttyUSB3: GSM modem (1-port) converter now disconnected from ttyUSB3
[66399.134171] option 2-1.4:1.3: device disconnected
[66399.139530] cdc_ether 2-1.4:1.4 usb0: unregister ‘cdc_ether’ usb-1e1c0000.xhci-1.4, CDC Ethernet Device

I think I’ll switch it back to QMI mode since at least I could send AT commands most of the time there.

Ok I have QLog setup on the router and various filter files I can choose. I’ll 1st set the modem back to QMI mode and next TX hang try to capture something after the hang.

qlog filter files
-rw-r–r-- 1 root root 2012 Mar 19 01:11 T1.LinuxData-OTA-DataService-AP_V01.cfg
-rw-r–r-- 1 root root 2260 Mar 19 01:11 T2.RegServ-CotextAct_V01.cfg
-rw-r–r-- 1 root root 2144 Mar 19 01:11 T3.SimpleData-(TCPUDP)_V01.cfg
-rw-r–r-- 1 root root 2116 Mar 19 01:11 T4.Throughput_V01.cfg
-rw-r–r-- 1 root root 4226 Mar 19 01:11 T5.COMMON-T1T4_V01.cfg
-rw-r–r-- 1 root root 4191 Mar 19 01:11 T6.FullMessage.SimpleLogPacket(AT)_V01.cfg
-rw-r–r-- 1 root root 2205 Mar 19 01:11 T7.V2X_ALL_V01.cfg
-rw-r–r-- 1 root root 3480 Mar 19 01:11 default.cfg

Dear Thomas,
If you still have problems, we can arrange our local FAE to support you.
In order to provide better support to you,could you help provide some info about the project?
Thanks.

  • Project name (so we can refer to it in the future):

  • Application type:

  • Estimated Annual Units (for series production):

  • Project timeline:

  • Current status:

  • From which distributor do you buy Quectel modules ?

  • Do you have EVB kit for this application?

Hi Peter, thanks for the response.
I’m working with the openwrt ROOter project to get improved support for this modem into the router software.
The firmware issues have created a stumbling block for users of the RM500Q-AE, so many users have resorted to rebooting the modem daily to avoid the TX hang.
If I were to name the project it would be as follows:

Project name (so we can refer to it in the future):

Openwrt ROOter support for RM500Q-AE and GL

Application type:

Router software that controls various cellular modems and is found in most 4g/5g routers. This software
needs to be able to nicely recover from modem issues and not require the user to have to reboot it.

Estimated Annual Units (for series production):

As this is a 5g modem anyone buying one of your modems and putting it in a Openwrt ROOter controlled router will face this issue, as will anyone using your modem for 5g in other routers.

Project timeline: 

Support for the RM500Q is available now, however rebooting the router to recover from modem failures is being discussed in the software development forums. Many potential buyers of your modems are following the discussion.

Current status:

Modem works fine, but workarounds such as rebooting the modem/router combo to get the modem issue to clear up are painful and users of openwrt are looking at other competitors modems.

From which distributor do you buy Quectel modules ?

524wifi.com online order

Do you have EVB kit for this application?

I have an EVB kit for a competitors modem but not this one (still its a qualcomm)

= Regards, Markham

thanks for your fast reply, could you also provide your company name and your located country, thanks!

Peter,
I’m contributing time to the Openwrt Rooter project but not under any particular company name.
https://www.ofmodemsandmen.com is their website and here you will see a list of supported modems:

https://www.ofmodemsandmen.com/modems.html
Note that under the modem selection only: Quectel EC25 EM/EP06 BG96 EM12 EM20 are listed as supported.

I am in the U.S. testing a RM500Q-AE on the T-mobile network for them because I have T-mobile home internet and can test the modem for long periods with lots of data. The RM500Q is very new and only special snapshot branches support it properly currently.
You can use the name: OpenWRT Rooter Project as a company name if you need one.
= Regards, Markham

Hello Thomas,
Thanks for your detailed reply, i have send a mail to you and looped our local FAE leader, please check,thanks!

Hello Markham,

Where you able to resolve the issue?

No, I did get hooked up with a U.S. FAE but its been 2 weeks since he’s replied to me in email.
I sent them qlog captures showing the TX hang, which they should be able to read using the qualcomm tool.
We’re now up to around 7 people in the Openwrt Rooter forums and elsewhere that see the same issue.
This is an example of the data I sent them:

> cat fail.log  (timestamp written after 30 seconds of no response from ping to 8.8.8.8)
> Sun Apr 4 22:49:50 CDT 2021
> 
> [5737.940]qlog_logfile_create /mnt/sda1/20210404_224519_0030.qmdl logfd=4
> [5742.028]recv: 4M 2K 1023B  in 4844 msec
> [5747.000]recv: 4M 3K 163B  in 4972 msec
> ... cut..  
> ^^^ working in most of this file
> [5808.233]recv: 2M 1011K 259B  in 5043 msec
> [5813.234]recv: 2M 287K 440B  in 5001 msec
> [5818.388]recv: 0M 731K 841B  in 5154 msec 
> [5820.854]recv: 4M 1K 781B  in 2466 msec
> ^^^ looks like it starting having issues at the '0M' 2 lines up
> 
> [5905.954]qlog_logfile_create /mnt/sda1/20210404_224807_0031.qmdl logfd=4
> [5910.853]recv: 2M 651K 766B  in 5263 msec
> [5916.163]recv: 0M 35K 492B  in 5310 msec
> [5921.164]recv: 2M 722K 623B  in 5001 msec
> [5926.174]recv: 0M 985K 806B  in 5010 msec
> ^^^ failed with TX hang shortly into this file
>     expected since TX timeout happens +watchdog timeout later
> 
> [ 6083.878684] ------------[ cut here ]------------
> [ 6083.883329] NETDEV WATCHDOG: wwan0 (qmi_wwan): transmit queue 0 timed out
> [ 6083.890174] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:448 dev_watchdog+0x2b8/0x2c0

Hello Markham,

That’s a shame. I’m considering returning my module. I wonder if the RM502Q-AE has these issues.

@Peter.Zhu-Q,

A large portion of your market are these individuals which buy your modules to use in a non-commercial way. It seems like @Markham_thomas is willing to provide you with the information to help you troubleshoot software/hardware issues with your product.