The ESP8266 Grand Master Plan

So… here’s where I’m up to with my little project… currently stalled until we can figure out why the ESP8266 is not recovering from loss of WIFI.

The plan

This is “the plan.  Using the basic MQTT code, I’ve added the ability to control GPIO0 and take input from say a Dallas DS18B20 chip.  I have the ESP8266 on a little board with a 3v3 regulator to ensure it gets decent quality power from a typical cheap 5v supply.  The MQTT software uses a block at 3c000 to store non-volatile information and it’s only using a small part of the 4K available.

Why 4K, because that’s the minimum size of FLASH you can erase – and if you can’t erase, you can’t write. Here is my understanding of the FLASH. You can write a 0 to it – but not a 1. So when you erase the FLASH – which can only be done in 4K blocks as a minimum in this case, the bytes are all set to 255 or 0xFF if you prefer Hex, or 11111111 in binary.

You can then write to that block – and read it as often as you like. The path that Tuan has taken to store user info, passwords etc is to use a 16K block at 3c000, use the first and second blocks to store information (the same information, ignore the third block and use one byte of the 4th block to tell you which of the first two has accurate info. The reasoning for this becomes obvious after a while, you don’t want to lose your info in the event of a power failure during erase or program. So, lets say you have a binary flag in the top block set to 1. That means that the existing data is held in the second block. You wish to change something – read the data into RAM, erase block ZERO, write a copy of the data to block zero and update that flag accordingly – if any of that fails, the data stays in block zero, if it succeeds the software will know to use block 1 from now on when powering up to get the data. Simples. GROSSLY wasteful, but simple.

So as well as input, output and non-volatile data I’ve also added a real time clock able to go for days without update but as it happens my MQTT fires out the correct time every minute. Should the broadband fail however, the clock will still keep ticking.

Now that the system can send messages and receive them including subscriptions to the time, dawn and dusk and other info, I have a state variable which tells it what it is doing – that is described at he top – and involves sending a message with number 0 upwards – which gets stored in FLASH. So 0 turns the output off, 1 turns it on all the way up to 5 which has it operating as a proper thermostat with 2 lots of on-off times.

All works a treat – except… it is not recovering from lost WIFI. So next steps – harassing Luan to ensure his MQTT code is ok and asking him to check to see if the new SDK helps – personally I’m just not that sure how the new SDK is dropped in. The instant this works – I’ll put this in a box with the solid state relay jobs I detailed earlier – and – Bobs your uncle.  First useful ESP8266 outcome. Now, given that there is plenty of code space, that APPARENTLY you can double the clock speed to 160Mhz if you need more power and the availability of ESP8266 boards with many more pins – who knows how this will develop.


12 thoughts on “The ESP8266 Grand Master Plan

  1. Pete, here a little scheme I have already used to optimize usage of flash to store parameters

    First, as you pointed out, using 1 flash-page (4K in that case) is not enough to be protected against savage power lost. But 4 flash-pages (16K) is a bit too much. My solution use only 2 flash-pages and use them in whole (note only a few bytes at the beginning).
    The idea is that each data-block include a payload (data you want to store) which is much less than the flash-page + 1 header byte. To make it simple, let's say you have 1 header-byte + 63 data-bytes.

    The header byte is split in 2 nibbles :
    higher nibble : data block status
    lower nibble : version number because your software may evolve and requires newer data fields but you want that a new software can read an existing data-block before it is written back.

    As an erased flash is all 1's, the status nibble would be interpreted as :
    1111b : free block, not yet used
    1110b : writing on going
    1100b : block valid
    0000b : block invalidated

    1) read procedure :
    - parse the data-blocks header bytes (jump every 64 bytes) until you find the 1st block with value 1100b (block valid). if you don't find any valid block, move to the other flash-page.
    - read the data (use version to interpret the data)

    2) write procedure
    a) Start from the block you've read before
    b) jump to the next block (+64 bytes). If you were at the last block of the page, erase next page and move to the beginning of it
    c) check that the block is free (1111). If not go back to b)
    d) write header-byte with 1110 in the upper nibble and 1111b
    e) write data
    f) write header-byte as valid : upper=1100b and lower=version
    h) write header-byte of the previous block as invalid (all 0)

    This optimize the usage of the flash as well as the life of the flash as erase are reduced to minimum.

    1. Barbudor you are brilliant. I had to read that a few times before it was completely clear and ended up drawing some pretty pictures to help but this is a better way given the write limitations of FLASH. One of the things I wanted was to update a flag on a regular basis but when I did the calculations, I was going to kill the flash in a year or so - doing it your way will give me at least 4 times the FLASH life, maybe 8. Well, thankfully, Tuan who's done the MQTT software has done a good job of making his storage code easy to read and so it should be no problem to adapt to your method or a variation - isn't it funny how sometimes you just need someone to trigger you off. Right now I'm not messing with it until the problem of the WIFI reconnect issue if that's what it is, is resolved but as soon as it's working reliably - I'm going to adapt what you've written above as that'll give me the ability to update the flash much more frequently. Thanks a lot.

      1. Maybe I don't understand your algorithm, but even if you want to write only 64 bytes you still need to erase a full page (4K), no matter if the new address contain only 0XFF, so you don't save any FLASH life.
        The best way is to use a header like Barbudor suggested for each page (4K) and move between the pages. In this way you don't save FLASH life. but if the erase or the write fails, you still have the old data in the old page.

        The role for the FLASH life is clear, the (amount of writing )=(amount of erase).

        1. The idea is sound - YES you can only erase 4k at a time - but you can write in little blocks until the 4k is full - as long as you don't expect to be able to write '1's you can make each preceeding block as finished - when you run out of one block, you start on the next. So let's say you have 512 bytes of code to save including your status byte... You erase the 4k block and start at the beginning, writing your first block with status nibble binary 1110 which means writing, and when you're done, change that to 1100. WHen you need to update the info, read that block, write to the next one and set the previous status to 1000 or 0000 so your states are 1111 unused, 1110 writing, 1100 current, 1000 or 0000 out of date. So for any block once erased, you get in this instance 8 uses before moving onto the other block - you have to have 2 blocks because at some point you have to erase one of them to all 1s.

          1. Sorry, my confusion was because of my day job.
            I work in one of the big Flash companies. Here, because we have to manage GB of data, our controller will erase 4K for each write.

            Well this is not the case here. We are doing the controller.
            So... the minimum size to write here is 256 byte, if we will use Barbudor algorithm with 2 sectors (8K) then we could write every 1 minute for 6 year.
            pretty good 🙂

            I'm working now on a solution for saving data from the RAM to the Flash when unexpected power-down occurred. A capacitor in parallel with the ESP. GPIO Interrupt will detect power lost and will save the data. I think 1 sec will be sufficient.

    1. GPIO0 is used as an output - I have a LED then a pull up to +V around 1k - so clearly the LED status is reversed.... GPIO-2 same (I have a test board with LEDS on these) - and In any case I have a 4k7 pullup on GPIO2 on the end of my Dallas chip.... all of that works fine - on RESET.... erm, on the ESP-01 would that be CH-PD - that's tied to VCC.

      1. many thanks for your information! 🙂

        last question 🙂

        i'm facing reboots when using secure connection with MQTT (i'm using nodemcu lua firmware).
        Do you know if i have to put the ca.crt file somewhere or i just need to set the secure bit to 1 in m:connect(ip, port, secure) function and do NOT put the ca.crt file on the esp8266 at all?

      2. Based on your experience...must GPIO2 be pulled up for normal operations or can be leaved floating? I know that GPIO0 must be pulled up, but i don't know about GPIO2
        Thanks 🙂

Comments are closed.