I'm having a number of problems with BC66 UDP communications and general OpenCPU crashes

benf · April 29, 2020, 4:03am

I’m using FW version BC66NBR01A10 and SDK version 1.5. I’m developing my application starting from the examples provided in the SDK.

Communication Issue:

I am unable to receive messages after RRC goes idle. I send a message to a server, and the server responds. If the server responds quickly, there is no problem. But if the server responds after RRC goes idle (~6 seconds), the message will not be received until the next time I send another message bringing RRC back to Connected state. Disabling eRDX has not made any difference.

OpenCPU issues:

I’m experiencing a number of crashes/resets. Many of them seem to occur during task switching, but some happen in the middle of tasks. None of them are consistent. Often it will crash after just a minute or two of running while sometimes it will run all night with no crashes at all. I’m sorry this is so vague, I’m just so lost here I don’t even know what information is relevant to provide. I’m happy to add more specifics as requested.

Thanks,
Ben

Abner-Q · May 2, 2020, 2:00am

Hi Benf
The interface Ql_SleepDisable () can be called to prohibit the AP from entering idle mode.
After calling the Ql_SleepDisable () interface, the module will also not be able to enter shallow sleep.
After receiving the downlink data from the server, calling Ql_SleepEnable () will enable the AP to enter idle mode.

benf · May 4, 2020, 7:50pm

Thank you for your response. I tried disabling sleep, unfortunately, this does not solve my problem. The message receiving behaviour is the same.

Abner-Q · May 6, 2020, 5:58am

Hi Benf
You first confirm whether the SIM card T3324 timer time and PSM are turned on?

WizIO · May 6, 2020, 6:37am

Hi
version 10 work very fine…
maybe you have “dead” loop or call/parameter NULL

benf · May 6, 2020, 9:07pm

Hi Abner,

My response to AT+CEREG? is: +CEREG: 5,1,“8C33”,“0A282A39”,9,0,0,“00100010”,“00111000”
That means that T3324 and PSM are turned on, right? My active period is 2 minutes, but for testing, I’m sending a message every 30 seconds. If the response from the server does not come within the RRC connected period of ~6 seconds after a message is sent, then it won’t arrive until the next message is sent 30 seconds after the previous one.

Hi WizIO,

Here are some specific examples of problems that sometimes happen (keep in mind, nothing is consistent and most things work fine most of the time).

I have a callback function for the QIRD command that does some tasks and then sends an OS message to another task. Sometimes, when this message is sent and the callback function returns, the program then switches to the other task, prints a debug statement and then tries to take a mutex Ql_OS_TakeMutex(s_iUniversalMutex,0xFFFFFFFF). It crashes immediately. Other times, after the second task has successfully finished processing the message and it returns the the beginning of the loop and calls Ql_OS_GetMessage(&msg) again to wait for the next message, it crashes and restarts immediately.

There’s another spot in the middle of a function that sometimes crashes. I print a value with a debug statement (so I know it’s ‘1’), then I get and print the current task priority and remaining stack size. Then I have an if statement based on the value that is ‘1’ followed by a debug print. It sometimes crashes before the final debug print (sometimes it will make it one or two lines farther than this and then crash, or even crash in the middle of printing a debug statement).

APP_DEBUG("lat long valid: %d\r\n", jtracker_response.ref_ll_valid);
APP_DEBUG("ephem bitstream size: %d\r\n", jtracker_response.ephemeris_bitstream_size);

//sometimes crashes here

int iRet = 0;
iRet = Ql_OS_GetCurrenTaskLeftStackSize();
APP_DEBUG("\r\n<--Task[interpret response]: Task Remain Stack Size =%d-->\r\n", iRet);
iRet = Ql_OS_GetCurrentTaskPriority();
APP_DEBUG("\r\n<--Task[interpret response]: priority=%d-->\r\n", iRet);


//sometimes crashes here
if (jtracker_response.ref_ll_valid) {
  APP_DEBUG("new lat/long is: ");
  //sometimes crashes here
  APP_DEBUG("%f, %f\r\n",
                jtracker_response.ref_lat,
                jtracker_response.ref_lng);
  //sometimes crashes here
  jedi200_dev->get_point_parameters.ref_latitude = jtracker_response.ref_lat;
  jedi200_dev->get_point_parameters.ref_longitude = jtracker_response.ref_lng;
  APP_DEBUG("new lat/long is: %f, %f\r\n",
            jedi200_dev->get_point_parameters.ref_latitude,
            jedi200_dev->get_point_parameters.ref_longitude);
}

lat long valid: 1
ephem bitstream size: 0

<–Task[interpret response]: Task Remain Stack Size =4408–>

<–Task[interpret response]: priority=4–>

F1: 0000 0000
V0: 0000 0000 [0001]
00: 0006 000C
01: 0000 0000
U0: 0000 0001 [0000]
T0: 0000 00B4
Leaving the BROM

Sometimes it will only get part way through printing a debug statement about sending a message and then crash soon after. Here’s an example where it fails to print the full message it is sending and then restarts soon after:
<–Send AT:AT+QISENDEX=0,118,"BFF7D
OK
, ret = 0 -->
done with statemachine, count: 1

F1: 0000 0000
V0: 0000 0000 [0001]
00: 0006 000C
01: 0000 0000
U0: 0000 0001 [0000]
T0: 0000 00B4
Leaving the BROM

Sometimes, on restart, my custom tasks don’t start (each are supposed to print a debug statement as their first line). There’s no crash here, it just doesn’t run.

Sometimes it can’t connect to the network, but never throws an error. It just sits there trying forever. After a manual reset, it usually connects within 10 seconds.

If it makes it past the first 5 or 6 loops (30 seconds each), then it will run forever with no crashes.

The patterns are the same, as far as I can tell, between my custom pcb and the BC66 eval board.

How can I get more information about these crashes? Is there any way to get errors printed out? How can I change task priorities? Is it possible the version 10 firmware file I have is corrupted?

Thanks,
Ben

Abner-Q · May 7, 2020, 2:20am

Hi Benf

CEREG: 5,1, “8C33”, “0A282A39”, 9,0,0, “00100010”, “00111000” cannot indicate that the module has PSM function turned on.
Whether the PSM function is turned on can be encapsulated with AT + CPSMS instruction to view.
According to the results of your CEREG query above, the active time is 24 minutes instead of 2 minutes.
In theory, downlink data can be received within the active time.

benf · May 7, 2020, 2:32am

+CPSMS: 1,“00111000”,“00100010”
I believe the order of TAU and active time is swapped between the CEREG response and the CPSMS response.

I spoke with the service provider and they think the message reception issue might be SIM card related - they are sending me a new one.

WizIO · May 7, 2020, 5:54am

yep, I see … maybe is dead-lock and watchdog reaction

so… Ql_OS_GetMessage() is very important function and here is hiden functions for all callbacks ( timers, uart… etc ) ( no information from Quectel, must have… )
All kernel callbacks send messages to the theread Queue and Ql_OS_GetMessage() decode this system messages and generate userware callbacks

Example: if you have Timer created in main thread, kernel callbacks/mesages is e translated to main queue and userware timer callback is executed in main thread

If you register a timer in an opencpu task, then you can only start or stop (and callback)
the timer in the same task. Otherwise,the timer can not be started or stopped.

If you have Ql_OS_TakeMutex() or blocked events in timer callback you will get dead-lock, and after time the watchdog will reset the module

maybe Quectel can explain this better than me, sorry for my bad English

benf · May 7, 2020, 7:05pm

If you register a timer in an opencpu task, then you can only start or stop (and callback) the timer in the same task. Otherwise,the timer can not be started or stopped.

Does this apply to the RTC too? I was registering it in the main task and then starting it in a different task. I moved the register line to the task that I start it in, but it didn’t appear to make any difference. I definitely did not have any issue starting and using the timer before.

If you have Ql_OS_TakeMutex() or blocked events in timer callback you will get dead-lock, and after time the watchdog will reset the module

If I’m understanding this correctly, this failure mode would cause the system to hang for some period of time before the watchdog resets the system. This is not what happens for me. There is no delay at all - the program is running, printing line after line and then suddenly it has restarted and is printing the BROM messages. It behaves as if someone had pressed the ‘reset’ button in the middle of execution.

WizIO · May 8, 2020, 5:59am

Soft timers, Uart, EINT callbacks is executed “hiden” in Ql_OS_GetMessage
RTC is ISR

no idea - debug calls and parameters for NULL
and firmware 10 must be develop with SDK 1.5 and hight