Remember I did an article entitles Serial Woes? Well that’s been changed to serial success as we found bugs in tools and gained an understanding of the USB FTDI handshaking lines – which altogether led to improvements in the Node-Red Arduino node and a completely working solution for USB Serial on the Raspberry Pi.
Well for the last few days I’ve been at the WOES part of a serious issue with our ESP8266 code. I’m hoping if Espressif see this and Richard of RBoot fame see this and we can get a little interaction – so that what looks like an issue with Espressif SDK 1.5.4 and Rboot can be resolved. I’ll update this as we go along.
So – those of you who’ve been reading this blog for a while will know that I’ve put a lot of work into developing my own code (along with a lot of help from others) for the ESP8266 to control ESP-12-based boards designed by Aidan Ruff – the software talking via MQTT back to a central controller – this is all working and is regularly updated. The software is BIG (well over 0.5 meg and hence for OTA purposes needs boards with a 4MB FLASH like the ESP-12 though currently it will work without OTA on the likes of the Sonoff controllers) – and is available as source code and .ROM files.
So a few days ago some guys in the blog wrote in to say they were having horrible difficulty flashing my ROMs and indeed that their boards were apparently broken!!!
I could not replicate this behaviour – I would take my ROMS, blow them – and all was well, my code worked. The code uses Richard Burton’s RBOOT to allow for OTA (over the air updating).
And then… I tried blowing the ROMS into a virgin ESP8266 board (and I mean virgin with no software on it – not one stocked with AT software). DEAD. But not just dead. Dead sometimes… and when it worked – it worked kind of.
Here’s what I mean. Look at the first image above – that’s a normal boot up of my code. “ESP starting…” some info on the use of GPIOs etc., software version and right at the end, another function is called which outputs “Web page control enabled” (or disabled).
Now look at the code below.. this is the output when I CAN get the code to work with SDK 22.214.171.124.
Do you see what is wrong – the “web page control enabled” message comes up FIRST – you might say that is IMPOSSIBLE – that code does not run until after the other stuff is all done… and there are bits missing – but more often than not the code does not run at all - and that is what pointed me to the problem – and the “solution”.
Here are sequences of events..
- Start with an existing ESP
- Flash my code
- Output probably ok
- Start with something like the NodeMCU software rom
- Flash my code
- Output seems ok but has the above nonsense (but works)
- Entirely WIPE the FLASH (yes, there is a quick way)
- Flash the new code
- Nothing works
- Load the NodeMCU rom (works mainly)
- Flash my code
- Works sometimes – might not work after reboot – but might.
This is cutting a LONG story short – a story of using various programs to try to wipe the 4MBYTES of the ESP-12 only to find in one case that the final meg would not erase – or in another case that no matter what address you put in it was always ONLY the bottom MB that was being erased – that wasted a few hours I can tell you!!!
Finally I hit on this – more by accident than anything else:
C:\Espressif\utils>esptool.py -p COM3 -b 115200 erase_flash
This took an instant and appears to wipe the entire Flash perfectly.
Even using the official tool would seem to work – but when you watched the addresses go by – it was obvious that only the bottom 1 meg was being erased!!
But I digress.
So when I finally got to analysing the problem (scratching my last hairs out) and realising that something was wrong – maybe a start up vector in RAM not being set properly, I decided to try something – instead of using the latest Espressif SDK 1.541 (which coincidentally I loaded and compiled at about the time people started griping about non-working boards – but of course I was simply OTA-ing my existing, working boards and seeing nothing) – I went back a version to 1.53 – and….
IMMEDIATELY (thank heavens) the problem went away – I could now program the boards in the normal way and they worked (work) perfectly.
So at www.scargill.net there are some working roms using SDK 1.53
rboot.bin and romx.bin
The above work (for example http://www.scargill.net/romx.bin)
Here is the same compiled under 1.541
rboot_duff_154.bin and romx_duff_154.bin
These are the ones that were up there until now – and were giving trouble. I’ve just renamed them. Easy to test – put the first at 0x00000 and the second at 0x2000 (that’s 0x2000 NOT 0x20000) after running that erase procedure above – and one will work – the other will not.
So – the question is now – is there something wrong with the Espressif SDK 1.541 – or have they moved or changed something that is giving Richard’s RBOOT (which has not changed for a long while and which has proven to work perfectly) trouble. Surely it has to be one or the other. My impression, rightly or wrongly is that the code is starting up in the wrong place – sometimes in the middle of no-where, sometimes almost in the right place (hence the re-arrangement of output).
I’ve pointed both Richard and Espressif to this as it is possible to replicate the issues quite easily on an ESP-12 (my viewable serial output at powerup is 115k baud after the usual 78k debug info).
Hopefully this will become a shorter success story with a fix somewhere. If it is affecting my code – it has to be affecting other code (and no I’m not running out of variable RAM – thought of that). Right now I simply cannot use SDK 1.541