Espressif Save to FLASH–an alternative

In this article I’ll talk about making  use of FLASH to store information in the ESP8266 wireless units (I refer here to those with 4Meg or more of FLASH – on the assumption that only the first 2 meg is in use plus a tiny bit at the end)…

To clarify – The ESP can make use of 1 meg of FLASH for program space, plus another meg if using OTA (online updating) – that is-  the second meg is used to upload a new program while the original is still running.  In addition Espressif use a tiny bit at the very top of whatever size FLASH you have. I’m going to talk here about those units with 4MB of FLASH – for example the very popular and cheap ESP-12 and variants.  If you had a unit with MORE memory then this could be expanded. LESS and this isn’t worth reading.

Espressif, in order to arrange for secure user data – use three 4k blocks…

three blocks espressif user data storage

The first two contain data up to 4096 bytes, the third says which block is the current one in use – accordingly that block is mostly wasted.

So when you want to update the data – you see which one or the two is NOT current – wipe it… and then update it, then update the third block with the pointer.

So you use 3, 4k blocks to get one good block of data. And I have to say it works, 100% – I’ve never lost data. Should the write procedure fail part way through due to power cut or whatever, all you lose is the latest update. The current block remains the previous one you were using.

In the SDK there is a routine to use this – I recall TUANPM had his own but then when Espressif incorporated this into the SDK I moved to their routines. They work perfectly.

The only issue with this is the waste of a block which if taken to extreme – using other spare blocks of FLASH would be quite wasteful.  I thought of the various existing file systems out there and figured they were overkill for my needs and not necessarily as secure as this.

So I had this idea to take TWO 4k blocks and use the top 32-bits of each as a counter.

So to read a block – look to see which of the  two has the highest number that is NOT FFFFFFFF  – that is the block to read.

2 blocks - counters at end

When writing, to whichever is NOT the current block – write the data including an incremented count (avoiding 0 and FFFFFFFF). As the count is the last thing to be written, any failure due to power loss will surely result in that 32-bit number NOT being updated – OR being reset to FFFFFFFF. In which case we stick with the current block and hence only the latest update is lost – just like the Espressif version but only using 2 blocks.

Reader Gary came in with a slightly better idea – to write the block – whatever that may be with the first 4 bytes set to FF.  (you must write on 4-byte boundaries to avoid the ESP having a fit).  Once the sector is written you can then go back and fill those 4 bytes in – because overwriting with 0 is always ok – what you can’t do is over-write with 1 as that is the function of the erase-block routine (4k).  I’ve now tested this –checking adjacent 32-bit words – it all works a treat.

2 blocks - counters at start, overwritten

The next stage was to formulate this into something tangible…

To read into a structure in RAM

Check both FLASH blocks first 4 bytes.. if both FF – current block is first else if one block is FF – current block is other block else current block is highest.

If not new, Read struct size in from relevant offset into block. If new, fill struct with FF !! No point in reading all.

To write a structure to FLASH

Check both FLASH blocks first 4 bytes.. if both FF – this is NEW. Else if either is FF current block is other – else current block is highest.

If NEW –  ERASE first block, write data into relevant part of FLASH – write number 1 into bottom 32 bits. Check – report any failure.

Otherwise read current block into 4K RAM buffer. FFFFFFFF into bottom 4 bytes. Erase OTHER block, write  RAM structure with updates to new block. Take original counter – increment – write to bottom of new block. Check, report any failure.

Apart from that double read-write which might be small  depending on your structure, this seems reasonably efficient.

And the result…

my_flash_read(sector, offset,buffer,buffer_size)

and

my_flash_write(sector,offset,buffer,buffer_size)

Assumptions for the above being that the sector would in fact be blocks of two sectors – so from the start of the third Meg to nearly the end of a 4 Meg FLASH (ESP-12 for example) you’d be able to use sectors 0-253 of protected data. The offset would be the offset from the start of the sector where your data is to go, the buffer the address of your buffer and the buffer_size to be no more than 4092 bytes. Ok, so any block can only be written to, maybe 100,000 times  – but if you wanted to – you could keep track of that.

For those interested the working test code is here – not the most concise – but it works. Given it a good hammering this morning – yet to decide how best to use this – but it’s a great move forward from thinking of that extra space as a great black hole. I’ve just put in a second version of the WRITE routine that does NOT use a 4K buffer – initial testing suggests it is working fine.. mind you – the tests are small.

[pcsh lang=”cpp” tab_size=”4″ message=”” hl_lines=”” provider=”manual”]

// Some assumptions-  your offset is on a 4-byte boundary...(size doesn't have to be) - your sector is 0-253 (for the top 2 meg)
// and your offset + buffer does not go over the 4096 (remember - first 4 bytes used up.

uint8_t IFA my_flash_read(uint32_t fsec,uint32_t foff, uint32_t *fbuf,uint32_t fsiz)
{ //READ
uint32_t first;
uint32_t second;
uint32_t current;
uint32_t structsize;
uint32_t flashbase;
uint32_t flashoffs;
uint32_t tmp;
uint8_t good;

if (fsec>253) { return 0; } // assuming starting at third meg and can't touch very top 4 secs (2 for each of these)
uint32_t flashStart=(fsec*2)+0x200000;

// size of struct must be rounded up - offset from base=arg1, ram buffer = arg2,
// size=arg3
// when writing if length not 32-bit rounded, will write to nearest boundary up

flashbase=foff&0xfffff000;
flashoffs=foff&0xfff;

structsize=fsiz; if (structsize&3) structsize+=(4-(structsize&3));
current=4096;
spi_flash_read(flashStart+flashbase,&first,4);
spi_flash_read(flashStart+flashbase+current,&second,4);

if ((first==0xffffffff) && (second==0xffffffff)) // ie all new
	{
	good=0;
	spi_flash_erase_sector(0x200+(flashbase/4096));
	tmp=1;
	spi_flash_write(flashbase+flashStart,&tmp,4);
	spi_flash_read(flashbase+flashStart,&tmp,4);
    current=0;
	if (tmp==1) good=1; else good=0;
	}
else if (first==0xffffffff) current=1;
else if (second==0xffffffff) current=0;
else if (second>first) current=1;
else current=0;

// can't read whole in once go if struct not on 4 byte boundary

current*=4096;
if (structsize==fsiz)
	{
    spi_flash_read(flashoffs+flashStart+4+current,&first,4);
	if (first!=0xffffffff) good=1; else good=0;
	spi_flash_read(flashoffs+flashStart+4+current,fbuf,fsiz);
	}
	else
	{
    spi_flash_read(flashoffs+flashStart+4+current,&first,4);
	if (first!=0xffffffff) good=1; else good=0;
	spi_flash_read(flashoffs+flashStart+4+current,fbuf,(fsiz&0xfffffffc));
	spi_flash_read(flashoffs+flashStart+4+current+(fsiz&0xfffffffc),&tmp,4);    //// CHECK
	memcpy(fbuf+(fsiz&0xfffffffc),&tmp,(fsiz&3)); // move those last 1,3 or 3 bytes
	}

return good;  // so you know if it is the first time or not.
} // done with READ operation - phew!!


// and now to tackle writing - same rules as reading - this version uses a 4k buffer

uint8_t IFA my_flash_write(uint32_t fsec,uint32_t foff, uint32_t *fbuf,uint32_t fsiz)
{ //WRITE
uint32_t first;
uint32_t second;
uint32_t current;
uint32_t structsize;
uint32_t counter;
uint32_t tmp;
uint32_t flashbase;
uint32_t flashoffs;
uint8_t good;
uint8_t bigbuf[4096];

if (fsec>253) { return 0; } // assuming starting at third meg and can't touch very top 4 secs (2 for each of these)
uint32_t flashStart=(fsec*2)+0x200000;

flashbase=foff&0xfffff000;
flashoffs=foff&0xfff;

// size of struct must be rounded up - offset from base=arg1, ram buffer = arg2,
// size=arg3
// when writing if length not 32-bit rounded, will write to nearest boundary up

structsize=fsiz; if (structsize&3) structsize+=(4-(structsize&3));
current=4096;
spi_flash_read(flashbase+flashStart,&first,4);
spi_flash_read(flashbase+flashStart+current,&second,4);

if ((first==0xffffffff) && (second==0xffffffff)) current=0;
else if (first==0xffffffff) current=1;
else if (second==0xffffffff) current=0;
else if (second>first) current=1;
else current=0;
	{
	good=0;

	spi_flash_erase_sector(0x200+(flashbase/4096)+(current^1)); // erase the OTHER one
	current *=4096;
	spi_flash_read(flashbase+flashStart+current,bigbuf,4096);
    memcpy(&counter,bigbuf,4); // copy counter
	memset(bigbuf,0xff,4);
	memcpy(bigbuf+4+flashoffs,fbuf,fsiz); // write the new data into the buffer
	if (current) current=0; else current=4096;
	spi_flash_write(flashbase+flashStart+current,bigbuf,4096);
	if (counter==0xffffffff) counter++;
    counter++;
	memcpy(bigbuf,&counter,4); // copy counter back
	spi_flash_write(flashbase+flashStart+current,bigbuf,4);
	spi_flash_read(flashbase+flashStart+current,&tmp,4);
    if (tmp==counter) good=1;
	}
return good;  // so you know if it is the first time or not.
} // done with WRITE operation 

// this version of writing does NOT use a 4k buffer - seems to work just as well.

uint8_t IFA my_flash_write_no_buffer(uint32_t fsec,uint32_t foff, uint32_t *fbuf,uint32_t fsiz)
{ //WRITE
uint32_t first;
uint32_t second;
uint32_t current;
uint32_t newcurrent;
uint32_t structsize;
uint32_t counter;
uint32_t tmp;
uint32_t flashbase;
uint32_t flashoffs;
uint8_t good;

if (fsec>253) { return 0; } // assuming starting at third meg and can't touch very top 4 secs (2 for each of these)
uint32_t flashStart=(fsec*2)+0x200000;

flashbase=foff&0xfffff000;
flashoffs=foff&0xfff;

// size of struct must be rounded up - offset from base=arg1, ram buffer = arg2,
// size=arg3
// when writing if length not 32-bit rounded, will write to nearest boundary up

structsize=fsiz; if (structsize&3) structsize+=(4-(structsize&3));
current=4096;
spi_flash_read(flashbase+flashStart,&first,4);
spi_flash_read(flashbase+flashStart+current,&second,4);

if ((first==0xffffffff) && (second==0xffffffff)) current=0;
else if (first==0xffffffff) current=1;
else if (second==0xffffffff) current=0;
else if (second>first) current=1;
else current=0;
	{
	good=0;
	spi_flash_erase_sector(0x200+(flashbase/4096)+(current^1)); // erase the OTHER one
	current *=4096;
	if (current) newcurrent=0; else newcurrent=4096;
	if (current) counter=second; else counter=first; if (counter==0xffffffff) counter++;

	tmp=0xffffffff;
	spi_flash_write(flashbase+flashStart+newcurrent,&tmp,4);	// write a blank counter

	uint32_t tstart,tstart2,tend;
	tstart=flashbase+flashStart+current+4;
	tstart2=flashbase+flashStart+newcurrent+4;

	tend=flashbase+flashStart+current+4+flashoffs;

	while (tstart<tend)
		{
			spi_flash_read(tstart,&tmp,4);
			spi_flash_write(tstart2,&tmp,4);
			tstart+=4; tstart2+=4;
		}

	spi_flash_write(tstart2,fbuf,structsize);
	tstart+=structsize; tstart2+=structsize;

	while (tstart<4096)
		{
			spi_flash_read(tstart,&tmp,4);
			spi_flash_write(tstart2,&tmp,4);
			tstart+=4; tstart2+=4;
		}


    counter++;
	spi_flash_write(flashbase+flashStart+newcurrent,&counter,4);
	spi_flash_read(flashbase+flashStart+newcurrent,&tmp,4);
    if (tmp==counter) good=1;
	}
return good;  // so you know if it is the first time or not.
} // done with WRITE operation


[/pcsh]

28 thoughts on “Espressif Save to FLASH–an alternative

  1. Hi there !

    Sorry for posting here ( quite related to the subject but not exactly it :/ .. )

    After stumbling upon your article, I was wondering if you ever tried replacing the flash chip on esp’s-xx ?

    I am currently facing a problem you may have already overcome: using 16Mbytes flash chip ( W25Q128 ) & flashing it to use most of the space available 😉

    The following link provided useful hints on the subject, but I’m still missing some to estimate implementation time ( and if I can do it at all 😐 .. ):
    https://github.com/themadinventor/esptool/issues/123

    So, feel free to spread your well-earned knowledge if it can help in the following:
    http://forum.espruino.com/conversations/279176/?offset=125#comment13202242

    Thanks for the reading,
    Keep up the good work ;p

    +

  2. SteveL – what I’m doing is adding in an XOR of the counter for speed – just after the counter – then checking both. If there’s a power cut – it will never get that far and the block will either have an out of date count – or FFFFFFFF or the two values are unlikely to XOR correctly. That seems a reasonable compromise. Don’t want to make this one a lifetime project – considering the existing options don’t do any of this.

  3. Hi Pete,

    At the risk of asking what might be an FAQ, I’m really struggling getting started with OTA.

    It’s not that I need help debugging anything as I haven’t even got as far as trying it – I’m basically struggling to get my head around where to start. I’ve tried to go through as many of you blog posts as possible and have your complete Home Automation project downloaded but that’s super overkill.

    I’d be grateful if you have any pointers to blog posts etc on the basics, i.e., flash a virgin ESP12F module (using a USB/serial module) into a vanilla OTA-ready state which does nothing but check periodically for an OTA update which will contain extra code which actually does “something” (and also continues to check for updates). Am I making sense?

    I know how to flash an ESP with USB/serial module etc, it’s just the vanilla OTA code I can’t seem to find reference to.

    Cheers,
    Brian

      1. Hi John,

        Either Win XP or Debian Linux (the latter is my main desktop). So far with ESP modules I’ve tried NodeMCU & ESPlorer with LUA programming (not good) and also the Arduino IDE which I prefer as the C family of languages is more my line. I haven’t got Eclipse setup yet but I think it might be the way to go (I’ve used Eclipse for Android app development and feel comfortable with it).

        Cheers,
        Brian

        1. Brian,

          The Arduino/Linux/C combinations is one of the easiest to implement. Have a look at the “how-to” on this blog:-

          Over The Air update for ESP8266 using Arduino IDE

          Start at the “Let’s start” line (unless you’ve got a really old Arduino IDE installation).

          Basically you add the (strange looking) OTA code from BasicOTA.ino (hidden under in the .arduino15/packages in your home directory) into your “setup()” function and then add the single line:-

          ArduinoOTA.handle();

          …into the “loop()” function. You’ll need to upload that via serial. After that your ESP will appear as a network device under the “port” option of the “Tools” menu (you may need to restart the IDE to get it to show up).

          Good luck!

          -John-

  4. Old and set in my ways am I, so I usually proceed (over breakfast) through the normal sequence of email, world news, tech news and embedded news, finishing off with the “entertainment, ideas and inspiration” section – Hack-a-Day and Pete’s Tech blog. Imagine my horror this morning when I found that my last unread article in HaD was a link to the latter. Yup, as expected “Failed database connection” and bugrall else for half an hour.
    Congratulations on the HaD exposure Pete, that must certainly have put a bump on the daily readership graph (going to have to start kicking your provider again, though). 😉

    1. Hi – well, I did kick them when I got back last night – and the site is up – no reason why it was down yet… yes, interesting that someone has copied what I’ve done – now I’ll grant you it is nice that they’ve taken the interest – but of course as I improve that code – the Hackaday version becomes… wrong! And I’m already working on an improvement.

  5. What if you lose power part-way through writing the sequence number so some of the bits have been written and some not?

    1. In that case you will not have updated the new FLASH area with a higher number. So the very first thing that happens in the new flash area is erased to FF in one operation – that invalidates it. I then write FFFF into the first 4 locations, that invalidates it – I write the rest and only THEN do I go back and write in the increased number. That should be fairly foolproof except in the very unlikely case where somehow the entire sector gets randomised. For that I’m considering adding in a checksum after the first 4 bytes or just the XOR of the first 4 bytes – remember none of this happens until the data has already been written.

      1. I think you’ve got to consider this… wherever you say “in one operation” (e.g. “the new flash area is erased in one operation”) you might want to consider that these operations take a finite amount of time and that the power loss could happen during that time.

        For example, suppose you have blocks with sequence 0x00001234 and 0x00001235 and that you want to erase the older block, which will convert sequence number 0x00001234 to 0xFFFFFFFF. Power loss part way through the erase might leave this as 0x88889ABC, which is “higher” than 0x00001235 (and so deemed “newer”) but will then contain partially erased data.

          1. Yes, but erase takes a finite time during which some bits will not have been erased (i.e. they are below the ‘1’ charge threshold, some bits will have been partially erased (only just past the ‘1’ charge threshold, so not reliable) and some bits will have been fully erased.

            Check the data sheet for whatever FLASH device you are using – the block erase time will be non-zero, and usually quite large compared to the sector programming time. So from a software perspective erase is a single step, but in the real world a power loss can leave the bits as firm zeros, firm ones or something in between!

  6. @Squonk NOR flash also resets to binary 1’s (i.e. 0xFF bytes). Check out page 41 of the Winbond data sheet.

    “The Page Program instruction allows from one byte to 256 bytes (a page) of data to be programmed at previously erased (FFh) memory locations. ”

    @Peter, as far as using a word write to an 0XFFFF location in the flash block this is perfectly fine and yes, is nothing new. One assumption you have incorrect is that the state of the block from a failed erase or write is undetermined. You might read back garbage, in this case the counter value could be much higher, but you would not want to use this block. You can get around this by writing a known 4 or 8 byte pattern to the block so you know that you are not reading data from a block that failed erase. For more safety, before you erase the block, you want to clear the pattern to 0 as a precaution. Of course, it is possible that the pattern could randomly return mid-erase, but I think the probability is pretty low at this point.

    Next to be ultra safe you would also have to consider a power loss when writing the counter value and it is corrupted. Now your counter value could be above or below the desired counter value. This generally would work out since you would either use the block or revert back to the backup block. There however could be a case where you read MAX-1 as the count. That does not leave room for any more updates unless you now handle overflow condition of the counters.

    The alternative is to add another write after the counter write. It must be done as a separate command sequence. This write would clear a byte or word next to the counter. Now, before before using block counter value, first test the “check-byte” is clear to know the block counter is good. (And also verifying the block pattern from above)

    1. I’m covered for some of your points but indeed you have raised others which are interesting. I will go back to this and perhaps – take the SECOND 4 bytes and make them the XOR of the first 4. That should add a little extra security. Good – I’m glad you wrote that.

    2. Or just add a CRC (Prefereably 32 bit) value (of the data AND counter) to the counter. That way you know if all the data is correctly written and corectly read back.

  7. Good idea but there is one important thing missing (unless it’s just me that haven’t found it): CRC. Even if the counter is written last, and even if the written data is verified before writing the counter, there is always a risk of an application (or something else) corrupting the data in a block. I’d add a CRC to all the application data and counter, then just write it last together with the counter.

  8. Actually, I may be wrong as a Page Program (02h) can take from 1 to 256 bytes to write:
    https://www.winbond.com/resource-files/w25q32bv_revi_100413_wo_automotive.pdf#page=41

    If the last address byte is non-zero, then the following data bytes will be written starting on an address not page aligned until there are clock cycles, wrpping around in case you cross the page boundary.

    It can be interesting to perform 1 million writes at the same address with the same data, and then try to write something else after erasing the sector… Should work, but you need to make sure the system doesn’t perform erasures in your back.

    1. Well right now Im writing 4 bytes at a time to save having to have a 4k buffer. Is there a stunning advantage to writing 256 bytes at a time (makes weaving in an arbitrary buffer more difficult)?

  9. Nothing against you, Pete 😉

    There are only a few new things, but they tend to be more and more rare 🙂 Just don’t rush to get a patent on this one!

    JFFS is not that bad: it is a FS dedicated to Flash memory originally written Axis. It takes your idea one step further by making each file a linked list of versions of the original file, with a garbage collector process in charge of recycling the older blocks.

    What I said is true if you have fine access to the Flash memory, typically like within an MCU with integrated Flash, where you can write word by word. My technique is good when you have no EEPROM and you need to persist a counter into Flash memory with frequent updates.

    Unfortunately, on these SPI Flash chips like the ones used along with the ESP8266, the only possible erase/write operations are by using a chip/sector/32KB or 64KB block erase followed by a single or quad page program operation.

    For example in the Winbond w25q32bv (32MBit) SPI Flash datasheet:
    https://www.winbond.com/resource-files/w25q32bv_revi_100413_wo_automotive.pdf

    On page 5, it is specified as “More than 100,000 erase/program cycles”, and the corresponding instructions are 60h, 20h, 52h, D8h, 02h and 32h, respectively.

    So with these SPI Flash chips, if you write 56 endlessly at a given location, you erase the sector as many times, and ti wears out after approximately 100,000 cycles 🙁

    Your technique is still valid though, the only drawback is that it takes one word out of each block for your signature, still much better that Espressif original design.

  10. In fact, the erase operation performs differently depending on the Flash memory technology: all bits are reset (to 0) for NOR Flash, and set (to 1) for NAND Flash.

    However, it is OK to assume that you can write words, and even bits individually. What is happening when writing individual words like this, is that you only force the bits that have a value opposite to their erased value (i.e. forcing bits to 1 for NOR, and bits to 0 for NAND, leaving others untouched).

    As what is limited in terms of number of operations in Flash memory are erasures and NOT actual writes, I used once this property to implement a counter as a running bitmask into a whole Flash sector, thus multiplying the maximum Flash memory lifetime by the number of bits in the sector.

    What you propose here is not particularly new, this is what is used in Flash-based Journaling Flash File Systems (JFFS) for Flash wear-levelling.

    1. Of course it’s not NEW (is anything?)- but it’s mine – I didn’t steal it from anywhere and hopefully it will help others as I’m avoiding using any terms like JFFS 🙂 – also I’ve just written an alternative version that doesn’t use a buffer (I don’t like the idea of a 4k temporary hike in RAM when this is used). Still testing.

      NOW – what you say is interesting – and new to me …. so you can only ERASE X amount of times, so therefore writing has to stay at the SAME figure.. what assume what you are saying is you could write 56 to a location endless times – but it isn’t actually doing anything… in order for it to make a change, you’d have to erase it – so you’re back to that X limit – but perhaps you’re saying that it would be ok to change one bit at a time, hence for a meg that is 8 million “writes” and get away with it, yes?

  11. Hi Pete,

    yes, you can write individual bytes, which will be written correctly providing the original memory location has all bits set (erased).

    For clarity, eeproms can only clear bits when writing data, only the 4k sector erase can convert the bits within the eeprom to 1.

    Cheers

    Glen.

  12. Sounds like a good idea. As you say, providing you write the increment last.

    Writing to eeprom I would write the first 4 bytes as FFFFFFFF, so you can go back and stick the incremental number in last.

    1. Hi Glen

      Now just clarify that for me please – am I right in assuming then that although you have to ERASE a 4K block, you can WRITE individual 32-bit words?? If that is the case then yes your suggestion makes absolute sense.

Comments are closed.