Socket and connection status check in BG96?

So, situation is next. Trying to understand tcp work with BG96 I try to check behavior in different real-life conditions as no network, server fail and so on.

First what I did - I set qapi_recv to non blocking mode, and looped transmit of some data to server.
That works fine, data is sent and received.
Each time I call :
sent_len = qapi_send(sock_fd, buff, SENT_BUF_SIZE, 0);
sent_len is same as SENT_BUF_SIZE (well, that is understandable, strange that example is made to send all time that length instead of real length of buff, but letā€™s skip that for now).

What question is - if I remove antenna to emulate no network condition, qapi_send continue to send data and get valid return value. If I attach antenna back whole data appears sent to server. Looks there is some cache which I canā€™t find in declarations to know how big is it. Next thing - how to check actual status of connection/network to donā€™t cache too much ? I tried if I hold device with no antenna for a long time it get stuck, I guess that buffer overflow happens and that crashes it.

Grisha, TCP/IP stack must work as posix, there is not ā€œcacheā€ā€¦ send/recv fill data to net buffers
Go step back and make good implementation of DSS call

Right now I used example project supplied with SDK.
There is DDS initialization and whole connection handling made in separate thread.

The only change I did - there was one sending and blocking waiting for reply. I removed blocking and made sending in loop with 1 second period. All works fine as I said.

But, if I remove antenna, modem looses network - but data sending part keep acting as there is network and connection. When I put antenna back and network is returned - it flushes all collected data to server. Where all this data collected when network was not there ??? And how in that case data sending is recovered with no reconection (all data goes under same connection and socket both sides).
If there is no network for a long time modem get stuck and restarts by watchdog I guess. Thatā€™s why I want to find where that is cached and how to get information about network condition to prevent hanging.

Hi Grisha ,

you know that TCP protocol face to connection , if you make the signal bad , when qapi_send() send one data packet , the module will wait ACK from Network , if the module has not received ACK signal , the data will be re-sent , so the result is the buffer is full, and the retue_value will be set =0 or <0 . Therefore, in your code, you have to add a judgment to determine whether the ā€œsend lenā€ is greater than 0,or not ?
If not ,The system will set the handle ( sock_fd) to be invalid, and then close the TCP session. I attached part of example code as below for your reference .

Exactly that I am referring to. It keeps returning valid value there when antenna removed and data collects somewhere.
In case if I close port on server side I get -1 there. But not in network loss condition.

print all DSS events

and remove antena

All DDS events are printed as in example. No event comes when I remove antenna, no matter how long antenna is off.
Really that feels like FW version for modem with LLVM support has some issues on network detection at all :frowning:

Hi Grisha,

image

I think it means that sent buffer is still not full . if the system crash at this time , pls send email to support@ quectel.com , if the system is not crash , I think it is normally state .

What is size of buffer ?
What happens with data in that buffer if network recovery will not happen or when it will happen server will be down ? Or if battery ran out. In this condition I canā€™t check from app if data was really sent or kept somewhere in buffer and canā€™t make correct decision if I can or canā€™t delete it.
That may cause packets loss and in this way that looks to be not a solid solution.
And yes, sometimes that makes crash when kept for too long, but that is not main issue right now.

Hi Grisha ,

  1. In APP layer ļ¼Œ the default buff size is 128 ļ¼Œ the bottom level ļ¼Œ the buff size is fixed by Qualcom.
    image

  2. TCP is a reliable transport layer protocol, we do not need to care about this, data will not be lost.While Failed to transmit, sock handle will be close.

  3. sent_len is the return value of tcp sending , if sent_len >0 , it means that you have send data to network successfully ,if not you need to resent in your code .

Hi,

  1. That is not for buffer with cumulative effect we are talking about. Just array we drop to function.

  2. TCP is reliable, but I easily can offer real life scenario which will cause data loss with no way to find that on device side :
    a) Device is sending in normal way ans server receives it. Device controls that data was sent by return sent_len value.
    b) Network loss happens (that can be vehicle mounted device, went out of coverage area or entered tonnel or parking). Packets get cached in cache I am talking now, sent_len is still valid and device assumes data is sent.
    c) Here we have two possible ways. If network came soon and connection was not closed server side - all is fine. BUT! If server side closed socket during no network stage (that can be due to not activity or not getting some reply) - all collected data will be lost and there is no way on device side to find that as sent_len was correct all time.
    So as you can see that is extremely important to know at least buffer size to have estimate what potential amount of data can be lost in that scenario and in best case have some indication that data was delayed. As I saw that is more that 200 bytes according to data I was able to collect there.

Regards
Grigor

Hi Grisha ļ¼Œ

sent_len = qapi_send(sock_fd, sent_buff, strlen(sent_buff), 0);

You know , sent_len is return value of qapi_send(), qapi_send() call TCP protocol , so in your scenario B and C ļ¼Œ while TCP can not receive ACK from network ļ¼Œ this return value will be change to <1 by the function Qapi_send(), so you just need to add a judgement to check sent_len >0 ,or not , if not , pls resent your string.

if socket is blocked send(ā€¦) will block to ACK or exit after maximum timeout
if is non-block must return something as EWOULDBLOCK
MTU is ā€œdefaultā€ between 1024 to 1500, but if is involved MMU translation between kernel/user (with queue), maybe has a bug

or DSS respond timeout(without antena) is too big
on SDK2 DSS responds and connection is closed

Exactly that is what I mean, when no network - I still get VALID return sent_len from qapi_send. I will make you video later to show that effect and both scenario ways I spoke about.

Hi Spephen,

Sorry for delay, was bit busy.
So, here is video where I show some scenarios when sent_len canā€™t be used for data receipt validation and can be lost when sent_len is >0.


About network status check we spoke via email (status = qapi_Device_Info_Set_Callback_v2(device_info_hndl,
QAPI_DEVICE_INFO_NETWORK_IND_E, tcp_network_indication_cb)) - return value is fine now, fixed and getting calls to tcp_network_indication_cb , but each time I try to access data provided to tcp_network_indication_cb I get app crash. Please send sample of tcp_network_indication_cb declaration, looks I am using wrong arguments for it.

Regards
Grigor

For me, this is critical issue :frowning:
DSScall not respond correct, there is event disconnect: (work in SDK2, firmwares BG96ā€¦A2ā€¦)


and network-sockets is not synchronized between kernel and user space
+CREG respond 0 and 2 ā€¦ socket is ā€œaliveā€ ?!

Yes, socket stay ALIVE when CREG shows no network. And even sends cached data if in some short period network recovered. Sounds crazy thoā€¦
This way we never know if data was sent or was not.

I used part of tcp_client sample, there call disconnect event is indicated but really I never received it in test conditions.

Hi Grishaļ¼Œ

If you meet module crash ļ¼Œwe have more professional tools to analyze

Pls collect your dump log with the tools linked as below and sent log to us ļ¼ˆ support@quectel.comļ¼‰ the dump log tools linkļ¼š

https://cnquectel-my.sharepoint.com/:f:/g/personal/america-fae_quectel_com/EmuR9cKm_7FNqtsO37DOVw8Bv-MI-8k6gq3UDLNpR1-WAg?e=yX7AJN

BTWļ¼š I do not know why , I can not open the video link from my side .

1 Like

Stephen, video is just youtube video. If you are in China you may need to use VPN. I canā€™t upload it somewhere else as it is big (about 2.4 GByte), but it shows critical issue with sockets so I hope you will find way to look on it.
About crash - can you please first give me sample of correct definition of callback function please ? I donā€™t want to make a few day trip into that to hear in the end - ā€œhey, you have callback definition wrongā€ and reply ā€œyea, I told you that may be reason in the beginningā€.
So, please show HOW call back should be defined. If that will not work - I will use your advice with deep debug.

Btw, maybe you have some messenger as skype or any other we can use ? Our company tries to use quectel first time now and we are very time limited. No much questions to ask, but sometimes we get short questions we need to get rapid and exact answer, so I will highly appreciate if we can setup some better communication than forum.