ESP SDK and RBoot Woes

Remember I did an article entitles Serial Woes? Well that’s been changed to serial success as we found bugs in tools and gained an understanding of the USB FTDI handshaking lines – which altogether led to improvements in the Node-Red Arduino node and a completely working solution for USB Serial on the Raspberry Pi.

Well for the last few days I’ve been at the WOES part of a serious issue with our ESP8266 code.   I’m hoping if Espressif see this and Richard of RBoot fame see this and we can get a little interaction – so that what looks like an issue with Espressif SDK 1.5.4 and Rboot can be resolved. I’ll update this as we go along. 

So – those of you who’ve been reading this blog for a while will know that I’ve put a lot of work into developing my own code (along with a lot of help from others) for the ESP8266 to control ESP-12-based boards designed by Aidan Ruff – the software talking via MQTT back to a central controller – this is all working and is regularly updated. The software is BIG (well over 0.5 meg and hence for OTA purposes needs boards with a 4MB FLASH like the ESP-12 though currently it will work without OTA on the likes of the Sonoff controllers) – and is available as source code and .ROM files.

So a few days ago some guys in the blog wrote in to say they were having horrible difficulty flashing my ROMs and indeed that their boards were apparently broken!!!

I could not replicate this behaviour – I would take my ROMS, blow them – and all was well, my code worked. The code uses Richard Burton’s RBOOT to allow for OTA (over the air updating).

And then… I tried blowing the ROMS into a virgin ESP8266 board (and I mean virgin with no software on it – not one stocked with AT software). DEAD. But not just dead. Dead sometimes… and when it worked – it worked kind of.

Image636042896730738685

Here’s what I mean. Look at the first image above – that’s  a normal boot up of my code. “ESP starting…”  some info on the use of GPIOs etc., software version and right at the end, another function is called which outputs “Web page control enabled” (or disabled).

Now look at the code below.. this is the output when I CAN get the code to work with SDK 1.5.4.1.

Image636042895980185009

Do you see what is wrong – the “web page control enabled” message comes up FIRST – you might say that is IMPOSSIBLE – that code does not run until after the other stuff is all done… and there are bits missing – but more often than not the code does not run at all – and that is what pointed me to the problem – and the “solution”.

Here are sequences of events..

  • Start with an existing ESP
  • Flash my code
  • Output probably ok
  • Start with something like the NodeMCU software rom
  • Flash my code
  • Output seems ok but has the above nonsense (but works)
  • Entirely WIPE the FLASH (yes, there is a quick way)
  • Flash the new code
  • Nothing works
  • Load the NodeMCU rom (works mainly)
  • Flash my code
  • Works sometimes – might not work after reboot – but might.

This is cutting a LONG story short – a story of using various programs to try to wipe the 4MBYTES of the ESP-12 only to find in one case that the final meg would not erase – or in another case that no matter what address you put in it was always ONLY the bottom MB that was being erased – that wasted a few hours I can tell you!!!

Finally I hit on this – more by accident than anything else:

C:\Espressif\utils>esptool.py -p COM3 -b 115200 erase_flash

This took an instant and appears to wipe the entire Flash perfectly.

Even using the official tool would seem to work – but when you watched the addresses go by – it was obvious that only the bottom 1 meg was being erased!!

Wipe the Lot

But I digress.

So when I finally got to analysing the problem (scratching my last hairs out) and realising that something was wrong – maybe a start up vector in RAM not being set properly, I decided to try something – instead of using the latest Espressif SDK 1.541 (which coincidentally I loaded and compiled at about the time people started griping about non-working boards – but of course I was simply OTA-ing my existing, working boards and seeing nothing) – I went back a version to 1.53 – and….

IMMEDIATELY (thank heavens) the problem went away – I could now program the boards in the normal way and they worked (work) perfectly. 

So at www.scargill.net there are some working roms using SDK 1.53

rboot.bin and romx.bin

The above work (for example http://www.scargill.net/romx.bin)

Here is the same compiled under 1.541

rboot_duff_154.bin  and romx_duff_154.bin

These are the ones that were up there until now – and were giving trouble. I’ve just renamed them.  Easy to test – put the first at 0x00000 and the second at 0x2000 (that’s 0x2000 NOT 0x20000) after running that erase procedure above – and one will work – the other will not.

So – the question is now – is there something wrong with the Espressif SDK 1.541 – or have they moved or changed something that is giving Richard’s RBOOT (which has not changed for a long while and which has proven to work perfectly) trouble. Surely it has to be one or the other.   My impression, rightly or wrongly is that the code is starting up in the wrong place – sometimes in the middle of no-where, sometimes almost in the right place (hence the re-arrangement of output).

I’ve pointed both Richard and Espressif to this as it is possible to replicate the issues quite easily on an ESP-12 (my viewable serial output at powerup is 115k baud after the usual 78k debug info).

Hopefully this will become a shorter success story with a fix somewhere. If it is affecting my code – it has to be affecting other code (and no I’m not running out of variable RAM – thought of that). Right now I simply cannot use SDK 1.541

Thoughts?

20 thoughts on “ESP SDK and RBoot Woes

      1. Any updates on this issue? I am trying to figure out the memory mapping/usage and discovered rboot. After some time reading I believe the BOOT_RTC_ENABLED and BOOT_BIG_FLASH will let my applications function correctly. But of course it’s useless since SDK 2.x is here and I’ve seen nothing about your issues getting addressed.
        Thanks

        1. The issue was resolved some time ago – by blowing the third file on a new installation and minor change to my startup code, all has been fine ever since.

    1. Current version up there 1.5.21 is based on SDK 1.5.2 and latest RBOOT. Works well here. Source updated, roms updated.

  1. I have things working just fine with the latest rBoot and SDK 1.5.4.1. But I will say that I *was* having issues initially when I first tried out the patched version.

    I’m no expert on memory mapping but I believe the issue is that depending on the way you have your rBoot ROMs set up you have to flash init data and blank.bin to 2 different sectors. Here are the steps I took:

    1. Recompile esp-open-sdk with the patched libraries
    2. Implement user_rf_cal_sector_set() in user_main.c as described in the updated API guide. I use their code verbatim which I believe ends up using rf_cal_sec = 1024 – 5; when using rBoot 2 split ROMs on a 4MB esp-12e (FLASH_SIZE_32M_MAP_1024_1024).
    Note that this is mandatory, but you do not need to call the method anywhere as the SDK calls it itself.
    3. Now here’s what ended up fixing things for me. I use a split rBoot 2 ROM setup on a 4MB 12e. If I want to “reset” the rf_cal, I add these to my makefile flash command:

    +blank.bin #1:
    0x1FE000 blank.bin

    +esp_init_data.bin #1:
    0x1FC000 esp_init_data.bin

    +blank.bin #2:
    0x3FE000 blank.bin

    +esp_init_data.bin #2:
    0x3FC000 esp_init_data.bin

    When you flash those, you’ll notice on the first boot that a bunch of gibberish spits out for a second or two (that would not normally be there) and then rBoot boots as usual. You can see what it’s spitting out if you monitor @ 74880 baud.

    I have done plenty of OTA updates with this setup.

    1. **PS:

      Also note that if you are using SPIFFS you will have to make sure you map around those locations. That is, make sure SPIFFS start location + SPIFFS size does not overlap at all with the 0x1FC000 – 0x1FE000 or 0x3FC000 – 0x3FE000 regions.

    2. If I understand the maps – as I’m using RBOOT.bin and ROMX.bin – rboot goes to the bottom of the first meg and romx goes to 2000 on first or second meg – depending on which is needed.. and way up at the top of the 4th meg is the user data – which I assume the SDK will load.

      So I tried this…. for the sake of it…

      $(ESPTOOL) -p $(ESPPORT) -b $(ESPBAUD) write_flash $(FLASH_OPTS) 0x00000 $(RBOOT_BIN) 0x02000 $(RBOOT_ROM_0) 0x1FE000 $(NEWSTUFF)/blank.bin 0x1FC000 $(NEWSTUFF)/esp_init_data_default.bin 0x3FE000 $(NEWSTUFF)/blank.bin 0x3FC000 $(NEWSTUFF)/esp_init_data_default.bin

      i.e. adding in everything from 1fe000 upwards – made absolutely no difference…

      Here is 1.52 output.. at 78k then 115k.

      ets Jan 8 2013,rst cause:2, boot mode:(3,7)

      load 0x40100000, len 1344, room 16
      tail 0
      chksum 0x9c
      load 0x3ffe8000, len 660, room 8
      tail 12
      chksum 0xb8
      csum 0xb8

      rBoot v1.3.0 – richardaburton@gmail.com
      Flash Size: 32 Mbit
      Flash Mode: QIO
      Flash Speed: 80 MHz
      rBoot Option: Big flash

      Booting rom 0.

      and the 115k bits…

      ESP Starting…
      GPIO4 and 5 are outputs.
      Current web programming pin: 2
      GPIO13 is a serial RGB LED indicator.
      Software version 1.5.2
      Use the {debug} command for more information.

      STATION mode
      Web page control enabled

      and here is the SDK version 2 code..slightly different…

      ets Jan 8 2013,rst cause:2, boot mode:(3,7)

      load 0x40100000, len 1344, room 16
      tail 0
      chksum 0x9c
      load 0x3ffe8000, len 660, room 8
      tail 12
      chksum 0xb8
      csum 0xb8

      rBoot v1.3.0 – richardaburton@gmail.com
      Flash Size: 32 Mbit
      Flash Mode: QIO
      Flash Speed: 80 MHz
      rBoot Option: Big flash

      Booting rom 0.

      And the 115k output which has an entire block missing.

      STATION mode
      Web page control enabled

      1. Yeah so this is what mine spits out after I flash the bins to the end of the 4MB:
        (and for anyone looking at my original comment it looks like I was wrong and only the 2nd set of bin flashes are needed at the end of the 4mb/0x3fxxxx)

        Flash Size: 32 Mbit
        Flash Mode: QIO
        Flash Speed: 80 MHz
        rBoot Option: Big flash
        rBoot Option: Config chksum
        rBoot Option: irom chksum

        Booting rom 0.
        timeZ��6-�,5␆��{�}N| rf cal sector: 1019
        rf[112]�@���h�)���␐␔�␔␔�)m�uqXQH��Pi␐����0�@␜

        The rest is gibberish and I’m not sure what buad rate that’s at or if it spits out anything intelligible anyways.

  2. Weird. All the flash erase issues and flash programming issues you mention have nothing to do with the SDK or stage 2 bootloader given that it’s all done by the ROM. SO you seem to be pointing at issues in the ROM in a recent(?) batch of chips?
    I’ve also had problems at boot time with SDK 1.5.4.1 in Espruino and haven’t really gotten to the bottom of it.

    1. The response in Espressif forum suggests that they acknowledge they made changes in 1.5.4 which may be affecting RBOOT. That’s as far as it goes up to now and I’ve heard nothing from Richard.

    2. The erase issue was I believe to do with the particular programmer – and all the units program…. the problem was when it came to booting I think they were booting up in the wrong place. Now resolved totally with 1.5.3 – awaiting a solution for 1.5.4 – Espressif suggested their OTA works fine but I’m used to using RBOOT now and it’s a big learning curve to rehash the code and the Makefile (well, it is for me anyway).

  3. Hi Peter.

    I use rboot too and after update the SDK (1.5.4) I had problems with STATION mode.
    I did not can change the SSID/PWD.

    I try the code (https://github.com/tishtosh/esp-link/) with the SDK 1.5.4 and the STATION mode works.

    So, I think that new SDK and rboot does not are compatible.

  4. Maybe time to ask Espressif what the description of issue #3 in the 1.5.4-1 patch actually means?

    %%> 3 – Fix system state mismatch when call some cur and def APIs. (Resolved in ESP8266_NONOS_SDK_V1.5.4.1)

Comments are closed.