openCPU Timer issue on BC660K

Hello all,

We noticed in our module (BC660K) a software reset that we could not find the cause.

So far, what we suspect is the Ql_Timer_*() API that is causing the software reset.

Our firmware (in summary) connects to a remote server via UDP, and implements a “PINGREQ” mechanism to a remote server, where the server then replies like a “PINGACK” (similar to a keepalive in MQTT).

This PING mechanism uses a registered timer via Ql_Timer_Register(), Ql_Timer_Start(), where the autorepeat flag is set to true. So this timer runs forever with an interval of 30 seconds.

Besides that, we also have another timer, that runs every 60 seconds repeatedly, where it checks the signal quality (AT+CSQ).

Both timers belong to the same task.

Since debugging is a bit hard for this reset problem, any help is welcome since we could not figure out yet what is the core cause of the software reset.

You can call RIL and then query the CSQ by AT Command

Hello @herbert.pan-Q ,

Sorry, I didn’t quite understand your answer.

Just to make myself more clear, I am suspecting that the software reset is happening due to the overlapping of the two timers (at some point, they will overlap, because the first is running at 15 seconds, and the latter at 60 seconds). Rather than that, I do not see what could be causing this.

But, as I mention, this problem is very hard to debug, so, any idea on how to debug such corner cases would be welcome.

Unfortunately, the software reset is happening at very random times. During my tests, it happens at 20 seconds after boot, sometimes at 10 minutes, 15 minutes, etc… Non-deterministic.

We identified the problem using EPAT tool.

Our task stack was too small, and the EPAT reported a stack overflow on the task.

dear cassianocampes

Has the problem been solved?

Yes, increasing the stack size of the task solved the issue.

No software reset was seen anymore.

Hello, could you explain what did you do?
I think you changed the custom_tsk_cfg file, am I right? Could you give detail about the solution?

Thank you

Hello @satilla ,

Yes, I have some tasks like this:

TASK_ITEM("main_task_AAA", 1024, osPriorityNone, mainTaskAAAEntry, mainTaskAAAInit)
TASK_ITEM("main_task_BBB", 1024, osPriorityNone, mainTaskBBBEntry, mainTaskBBBBInit)

I identified that the stack size for main_task_AAA (for example) was too small, and I could identify it was a stack overflow by using the EPAT tool. This tool comes with the Quectel SDK. This tool runs on Windows machine only. If you are using the BC660K-GL-TE-B KIT, you can simply connect your dev kit using the USB Cable, and run the EPAT tool to capture the log messages.

In my case, it was reseting the firmware due to a stack overflow, as I said. On the log, I remember that it showed which task was overflowing (main_task_AAA for example), so I increased the stack size of it.

Hello @cassianocampes ,

I am developing a project on M66 and the reset problem has been happening during every boot process for 2 years. I have added the timer about 2 years ago to the application.
This is my default settings;

Blockquote
TASK_ITEM(proc_main_task, main_task_id, 101024, DEFAULT_VALUE1, DEFAULT_VALUE2)
TASK_ITEM(proc_reserved1, reserved1_id, 5
1024, DEFAULT_VALUE1, DEFAULT_VALUE2)
TASK_ITEM(proc_reserved2, reserved2_id, 5*1024, DEFAULT_VALUE1, DEFAULT_VALUE2)

Blockquote

I could not see the EPAT tool on M66 SDK. I think it is related to BC660 SDK.

Hello @satilla ,

Unfortunately, I do not have experience on the M66. But what I recommend is to read the Quick Start guide of this module and check if it has a debug UART. At least, in my case, the BC660K has a Debug UART.

Probably on the Quick Start Guide you will find info about it. The EPAT tool really helped me a lot to figure out problems related to random resets and etc.

Update 1:
I checked here on the M66 Hardware Design and it has a DBG_TXD/DBG_RXD UART port. I am not sure if it is compatible with the EPAT Tool but, at least you should get access to the UART debug port.

At first, you can try to attach a FTDI to this DBG port and see if you catch any data. AFAIR in my case, when I tried to access directly the DBG UART, it prints lot of stuf not human-readable (I guess baud was 115200). That is a good hint for you to know that the DBG UART is printing (even though it is garbage stuff).

Second step is to get the EPAT tool (only for Windows) and try to open the UART via the EPAT tool, which will show you the printed logs in human-readable way.

For the EPAT Tool you can ask Quectel personel for that.

1 Like