Update at 13:22:00 on 02/02/2015
This morning I had an email to say the WIFI loss was fixed in the MQTT code. I downloaded the latest MQTT software and put my code back into it. I turned the WIFI off for a few seconds then back on – that worked, sending out the message backlog and re-subscribing to messages. I then turned the WIFI off until the 2K queue buffer was full – that I guess is the ultimate test. That took several minutes. I turned the WIFI back on and…no. BUT we understand what’s wrong. The queue is full – and hence there is no room to re-subscribe. This is now fixed here https://github.com/tuanpmt/esp_mqtt. Everything seems to operate perfectly. That is important for a number of reasons as it suggests that there is nothing wrong with the SDK.
Original blog:
This is getting to a be a problem. I don’t know if it is the MQTT software or ESPRESSIFs SDK but there is an issue if you lose WIFI connection. I tested this on the road this week but now I’m back in the office I’m doing it a bit more scientifically.
So – I know that the WIFI here is perfect. I also know that the SDK is case sensitive so in order to simulate loss of signal it’s simply a matter of renaming the “office” WIFI to something else or just change case (my computer is hardwired) to test.
Here’s the result of this mornings test.
So what we have is an ESP-01 – I’m using the latest SDK (0.9.5) on a PC using Eclipse and using Tuan’s MQTT software as a base (NOT the LUA version – so I’m programming in C). The basic MQTT software has a number of subscriptions… including the time. A separate system updates the time every minute (Linux timestamp – seconds since 1974). The board has a temperature sensor which on a timer sends out the temperature every 5 seconds (using code that does not involve long delays).
Here is the log.. “in the beginning” the router has the WRONG name so that the board cannot see the WIFI.. It sits in a loop waiting….
no office found, reconnect after 1s
STATION_IDLE
STATION_IDLE
reconnect
STATION_IDLE
STATION_IDLE
STATION_IDLE
STATION_IDLE
scandone
STATION_IDLE
add 0
aid 1
pm open phy_2,type:2 0 0
cnt
STATION_IDLE
At this point I turn on the WIFI – and all is well, it is picked up – the board starts up and subscribes to the various topics.
After about a minute, it gets the time – and now starts transmitting the temperature (the sensor isn’t hooked up so it’s just sending dummy info)
dhcp client start…
STATION_IDLE
STATION_IDLE
ip:192.168.0.9,mask:255.255.255.0,gw:192.168.0.138
TCP: Connect to domain home.scargill.org:1884
DNS: found ip 109.170.132.114
TCP: connecting…
MQTT: Connected to broker home.scargill.org:1884
MQTT: Sending, type: 1, id: 0000
TCP: Sent
TCP: data received 4 bytes
MQTT: Connected to home.scargill.org:1884
MQTT: Connected
MQTT: queue subscribe, topic”output”, id: 1
MQTT: queue subscribe, topic”mytime”, id: 2
MQTT: queue subscribe, topic”time”, id: 3
MQTT: queue subscribe, topic”dusk”, id: 4
MQTT: queue subscribe, topic”dawn”, id: 5
MQTT: queue subscribe, topic”peak”, id: 6
MQTT: queue subscribe, topic”off_peak”, id: 7
MQTT: queue subscribe, topic”frost”, id: 8
MQTT: queue subscribe, topic”on_1″, id: 9
MQTT: queue subscribe, topic”off_1″, id: 10
MQTT: queue subscribe, topic”on_2″, id: 11
MQTT: queue subscribe, topic”off_2″, id: 12
MQTT: queue subscribe, topic”timestring”, id: 13
MQTT: Sending, type: 8, id: 0001
TCP: Sent
TCP: data received 5 bytes
MQTT: Subscribe successful
MQTT: Sending, type: 8, id: 0002
TCP: Sent
TCP: data received 5 bytes
MQTT: Subscribe successful
MQTT: Sending, type: 8, id: 0003
TCP: Sent
TCP: data received 5 bytes
MQTT: Subscribe successful
MQTT: Sending, type: 8, id: 0004
TCP: Sent
TCP: data received 5 bytes
MQTT: Subscribe successful
MQTT: Sending, type: 8, id: 0005
TCP: Sent
TCP: data received 5 bytes
MQTT: Subscribe successful
MQTT: Sending, type: 8, id: 0006
TCP: Sent
TCP: data received 5 bytes
MQTT: Subscribe successful
MQTT: Sending, type: 8, id: 0007
TCP: Sent
TCP: data received 5 bytes
MQTT: Subscribe successful
MQTT: Sending, type: 8, id: 0008
TCP: Sent
TCP: data received 5 bytes
MQTT: Subscribe successful
MQTT: Sending, type: 8, id: 0009
TCP: Sent
TCP: data received 5 bytes
MQTT: Subscribe successful
MQTT: Sending, type: 8, id: 000A
TCP: Sent
TCP: data received 5 bytes
MQTT: Subscribe successful
MQTT: Sending, type: 8, id: 000B
TCP: Sent
TCP: data received 5 bytes
MQTT: Subscribe successful
MQTT: Sending, type: 8, id: 000C
TCP: Sent
TCP: data received 5 bytes
MQTT: Subscribe successful
MQTT: Sending, type: 8, id: 000D
TCP: Sent
TCP: data received 5 bytes
MQTT: Subscribe successful
TCP: data received 18 bytes
MQTT topic: time, data: {1422703981}
TCP: data received 62 bytes
MQTT topic: timestring, data: {11:33 Saturday 31-01-2015}
Get another published message
TCP: data received 23 bytes
MQTT topic: dawn, data: {439}
Get another published message
TCP: data received 12 bytes
MQTT topic: dusk, data: {1042}
MQTT: queuing publish, length: 17, queue size(0/2048)
MQTT: Sending, type: 3, id: 0000
TCP: Sent
MQTT: Published
MQTT: queuing publish, length: 17, queue size(0/2048)
MQTT: Sending, type: 3, id: 0000
TCP: Sent
MQTT: Published
TCP: data received 13 bytes
MQTT topic: mytime, data: {400}
MQTT: queuing publish, length: 66, queue size(0/2048)
Time: 11:33:09 31/01/2015 Dawn=439 Dusk=1042 Tick=86392MQTT: Sending, type: 3, id: 0000
TCP: Sent
MQTT: Published
MQTT: queuing publish, length: 17, queue size(0/2048)
MQTT: Sending, type: 3, id: 0000
TCP: Sent
MQTT: Published
MQTT: queuing publish, length: 17, queue size(0/2048)
MQTT: Sending, type: 3, id: 0000
TCP: Sent
As you can see above – the unit is sending out temperature and the outgoing queue is not building up – all is well. So now I disconnect the WIFI.
pm close 7 0 0/96959550
TCP: Reconnect to home.scargill.org:1884
reconnect
STATION_IDLE
scandone
no office found, reconnect after 1s
STATION_IDLE
STATION_IDLE
scandone
STATION_IDLE
STATION_IDLE
STATION_IDLE
STATION_IDLE
scandone
add 0
aid 1
pm open phy_2,type:2 0 0
cnt
STATION_IDLE
And that sits in the loop waiting for a reconnect.. NOW I reconnect (by renaming) the WIFI
dhcp client start…
MQTT: queuing publish, length: 17, queue size(57/2048)
STATION_IDLE
STATION_IDLE
STATION_IDLE
DNS: Found, but got no ip, try to reconnect
STATION_IDLE
STATION_IDLE
STATION_IDLE
ip:192.168.0.9,mask:255.255.255.0,gw:192.168.0.138
TCP: Connect to domain home.scargill.org:1884
DNS: found ip 109.170.132.114
TCP: connecting…
MQTT: queuing publish, length: 17, queue size(76/2048)
TCP: Reconnect to home.scargill.org:1884
TCP: Connect to domain home.scargill.org:1884
TCP:Reconect to: home.scargill.org:1884
DNS: found ip 109.170.132.114
TCP: connecting…
MQTT: queuing publish, length: 17, queue size(95/2048)
TCP: Reconnect to home.scargill.org:1884
MQTT: queuing publish, length: 17, queue size(114/2048)
TCP: Connect to domain home.scargill.org:1884
TCP:Reconect to: home.scargill.org:1884
DNS: found ip 109.170.132.114
TCP: connecting…
TCP: Reconnect to home.scargill.org:1884
MQTT: queuing publish, length: 17, queue size(133/2048)
TCP: Connect to domain home.scargill.org:1884
TCP:Reconect to: home.scargill.org:1884
DNS: found ip 109.170.132.114
TCP: connecting…
TCP: Reconnect to home.scargill.org:1884
MQTT: queuing publish, length: 17, queue size(152/2048)
TCP: Connect to domain home.scargill.org:1884
TCP:Reconect to: home.scargill.org:1884
DNS: found ip 109.170.132.114
TCP: connecting…
MQTT: queuing publish, length: 17, queue size(171/2048)
TCP: Reconnect to home.scargill.org:1884
MQTT: queuing publish, length: 17, queue size(190/2048)
TCP: Connect to domain home.scargill.org:1884
TCP:Reconect to: home.scargill.org:1884
DNS: found ip 109.170.132.114
TCP: connecting…
TCP: Reconnect to home.scargill.org:1884
MQTT: queuing publish, length: 17, queue size(209/2048)
TCP: Connect to domain home.scargill.org:1884
TCP:Reconect to: home.scargill.org:1884
DNS: found ip 109.170.132.114
TCP: connecting…
TCP: Reconnect to home.scargill.org:1884
MQTT: queuing publish, length: 17, queue size(228/2048)
Note the problem – it is connected – or so it says – but the queue is still building up – as if it is unable to send anything out.
And that, right now renders the code useless as there will always be problems with WIFI. If anyone has any ideas do let me know – this has to apply to other setups to one extent or another. According to Espressif they have solved some issues with reconnects in the latest SDK – well, I’ve overwritten the old one and recompiled so unless I’m missing some trick in the compile process I’m definitely using the latest SDK.
Hi Peter,
In the log segment showing the reconnection attempts, I don’t see a record of the steps to establish the mqtt connection again:
MQTT: Connected to broker home.scargill.org:1884
MQTT: Sending, type: 1, id: 0000
TCP: Sent
TCP: data received 4 bytes
MQTT: Connected to home.scargill.org:1884
MQTT: Connected
… and the subscriptions…etc.
I’m a MQTT newbie, but doesn’t that mqtt initiation need to be repeated when the connection is re-established?
I’ve had little time to play this week but I hope to do some tests and development soon.
What isn’t clear in your post is whether the Wifi client is actually reconnecting (forgetting about the MQTT piece for a sec). When the ESP indicates a reconnect is is pingeable on the network? Does your router indicate it is connected?
Reading that log I would assume at a glance that the Wifi has reconnected but something in MQTT is off. Another idea what if you eliminate MQTT and build one of the more basic Wifi examples (maybe the web server) does it behave the same way?
It’s weird Jeff.. almost always it is reconnecting as I can send a message and see it respond, but then when I do that, the queue starts filling up again – so I’ve seen a WIFI BREAK/MAKE situation where I’ve been able to send messages to the board but it’s replies start queueing up.
I am going to try something a little simpler… problem is I’m not a whizz at this yet and the MQTT software is a good base… I’mn using the latest SDK and the latest MQTT software with a replacement MQTT.C file while MINH (I keep getting his name the wrong way around and calling him TUAN, sorry about that) sent me this morning to no avail.
I know it along shot, but maybe the problem is with the MQTT (like Jeff suggested ).
You all ways described the client side, maybe it’s the Broker? Try a different Broker like RSMB.
After all they all (Client and Broker) use the same WiFi.
I can test that by putting MQTT-SPY on the laptop and having that lose WIFI and reconnect over and over… good point
Got the same problems as you described… but Ran was right. I changed the broker and finaly it works perfectly 🙂
I tried this broker (for free) cloudmqtt.com
Hi Peter,
I too am using the MQTT C code with SDk 0.9.5.
I currently have no checks for disconnection etc… however the wifi seems to automatically handle this, and my MQTT reconnects after a minute or so and dumps the back log of messages.
As i said, I wasn’t sure how to check for disconnects, so I am using the standard MQTT code with MQTT connection in the wifi connection call back routine, and that is it.
I am off to play rugby now, but may be able to email the code later. I am also using your DS18B20 code from a few days ago.
Just be warned, i normally on code in c# or for PLCs so my c is pretty poor.
Regards,
Aaron
#include "ets_sys.h"
#include "driver/uart.h"
#include "osapi.h"
#include "mqtt.h"
#include "wifi.h"
#include "config.h"
#include "debug.h"
#include "gpio.h"
#include "user_interface.h"
#include "mem.h"
#include "os_type.h"
#include "driver/ds18b20_scar.h"
MQTT_Client mqttClient; //mqtt client intialise
#define DELAY 5000 /* milliseconds */
#define sleepms(x) os_delay_us(x*1000);
LOCAL os_timer_t ds18b20_timer; //create local timer for temp readings
extern int ets_uart_printf(const char *fmt, ...);
int (*console_printf)(const char *fmt, ...) = ets_uart_printf;
bool mqtt_connected;
int ICACHE_FLASH_ATTR ds18b20_read()
{
//from scargill blog, for reading temp from his library
ds_reset();
ds_write(0xcc,1);
ds_write(0xbe,1);
int temperature;
temperature=(int)ds_read();
temperature=temperature+(int)ds_read()*256;
temperature/=16;
if (temperature>100) temperature-=4096;
ds_reset();
ds_write(0xcc,1);
ds_write(0x44,1);
return temperature;
}
void wifiConnectCb(uint8_t status) //wifi connection check callback?
{
if(status == STATION_GOT_IP){
MQTT_Connect(&mqttClient);
console_printf("IP Address Assigned\r\n"); //debug print
}
}
void mqttConnectedCb(uint32_t *args) //mqtt connected callback?
{
MQTT_Client* client = (MQTT_Client*)args;
INFO("MQTT: Connected\r\n");
console_printf("MQTT: Connected\r\n");
MQTT_Subscribe(client, "/ESP/commands", 0);
console_printf("MQTT: Subscribed to ESP\r\n");
mqtt_connected = true;
}
void mqttDisconnectedCb(uint32_t *args) //callback for disconnect?
{
MQTT_Client* client = (MQTT_Client*)args;
INFO("MQTT: Disconnected\r\n");
mqtt_connected = false;
}
void mqttPublishedCb(uint32_t *args) //callback for published?
{
MQTT_Client* client = (MQTT_Client*)args;
INFO("MQTT: Published\r\n");
}
void mqttDataCb(uint32_t *args, const char* topic, uint32_t topic_len, const char *data, uint32_t data_len) //callback for data arrival?
{
char *topicBuf = (char*)os_zalloc(topic_len+1),
*dataBuf = (char*)os_zalloc(data_len+1);
MQTT_Client* client = (MQTT_Client*)args;
os_memcpy(topicBuf, topic, topic_len);
topicBuf[topic_len] = 0;
os_memcpy(dataBuf, data, data_len);
dataBuf[data_len] = 0;
console_printf("Receive topic: %s, data: %s \r\n", topicBuf, dataBuf);
if (strcmp(dataBuf, "1") == 0)
{
//if 1 do something
console_printf("It was a 1!!!\r\n");
}
else if (strcmp(dataBuf, "2") == 0)
{
//if 2 do something
console_printf("It was a 2!!!\r\n");
}
else if (strcmp(dataBuf, "3") == 0)
{
//if 3 do something
console_printf("It was a 3!!!\r\n");
}
else
{
//if unrecognised do something
console_printf("Unrecognised Command \r\n");
}
os_free(topicBuf);
os_free(dataBuf);
}
LOCAL void ICACHE_FLASH_ATTR ds18b20_cb(void *arg)
{
//routine here is run on timer
//prepare buffer for temperature (string)
int size = 4;
char buffer [size];
int nchars;
int t = ds18b20_read(); //start temp reading
nchars = os_sprintf(buffer, "%d",t);
console_printf("Read Temperature: %d \r\n",t);
if(mqtt_connected == true)
{
MQTT_Publish(&mqttClient,"/ESP/temp",buffer,nchars,0,0);
}
}
void user_init(void)
{
uart_init(BIT_RATE_115200, BIT_RATE_115200);//start uart connection for debug
os_delay_us(3000000); //delay for serial connections
CFG_Load(); //load user settings
console_printf("\r\nInitialising\r\n"); //debug print
//setup temp readings on timer
ds_init(14); // initialise ds18b20 sensor on gpio14
os_timer_disarm(&ds18b20_timer); //disarm temp timer
os_timer_setfn(&ds18b20_timer, (os_timer_func_t *)ds18b20_cb, (void *)0); //set the cb to use
os_timer_arm(&ds18b20_timer, DELAY, 1); //start timer with delay
//end of temp setup
//MQTT init
//MQTT_InitConnection(&mqttClient, sysCfg.mqtt_host, sysCfg.mqtt_port, sysCfg.security);
MQTT_InitConnection(&mqttClient, "192.168.0.13", 1883, 0);
//MQTT_InitClient(&mqttClient, sysCfg.device_id, sysCfg.mqtt_user, sysCfg.mqtt_pass, sysCfg.mqtt_keepalive, 1);
MQTT_InitClient(&mqttClient, "ESP1", "", "", 120, 1);
MQTT_InitLWT(&mqttClient, "/lwt", "offline", 0, 0);
MQTT_OnConnected(&mqttClient, mqttConnectedCb);
MQTT_OnDisconnected(&mqttClient, mqttDisconnectedCb);
MQTT_OnPublished(&mqttClient, mqttPublishedCb);
MQTT_OnData(&mqttClient, mqttDataCb);
//end of mqtt init
//wifi connection
WIFI_Connect(sysCfg.sta_ssid, sysCfg.sta_pwd, wifiConnectCb);
}
//TODO checks for wifi connection is unsuccesful
//TODO toggle LED or SSR from MQTT
//TODO read ADC and send over MQTT
//TODO Implement sleeping, to save battery life???
/if your PC is wired or on another WIFI it’s easy to check for disconnection.. just rename the WIFI… on my Draytek it’s a case of changing a letter and hitting ok… I can name back and forth..
What I discovered this morning was that even after rebooting the board I wasn’t necessarily getting everything back up and running so I’m wondering if the queue software is working properly. So there’s a 2k queue and that handles incoming and outgoing… I don’t understand it well enough – the subscribes may or may not be in a queue, their messages coming in may or may not be in a queue – I don’t have enough info – but outgoing messages are definitely in a queue.. but as they are not in FLASH I don’t understand how any of that could affect things after a power cycle – you would think that no matter what, after a power cycle everything would come up properly but that’s not what I’m getting.
Pete,
I’m still seeing -very- intermittent problems with the ARP cache with my modules (and this isn’t related to any specific application). It -does- seem to be related to the router (most of my access points are running DDWRT or OpenWRT). When the module comes back on line it sometimes seems to fail to respond to ARP broadcast requests and the ARP cache on the other machines on the local net lose their entry for the device if the connection has been down for long enough. The symptoms are that the ESP module can “see” the other machines, but the other machines are unable to route packets back to the ESP, as they don’t have a valid MAC address and the ESP doesn’t respond to broadcasts.
I have no idea how to do this on Windows, but it might be interesting to see what the ARP cache entry for the module looks like on your server when you see the problem occur. Under Un*xes, the command would be
arp -a
(as any user) to show all current entries in the cache. As an administrator, you can manually add your module back into the cache (if it’s missing) usingarp -s IP_ADDRESS MAC_ADDRESS
(all of which should become obvious once you’ve seen the existing entries using the “-a” option).-John-