Here we started a conversation about SD Lifespan. How can we make our SBC projects run for longer? In comments I've seen elsewhere, people seem to think it is OK that a Pi may well fail within a year due to SD - I don't think that is even REMOTELY acceptable unless you're making a novelty games machine. Read more as we came up with some great solutions...
People make a big deal about the reliability of Linux - not a great deal of use if the entire file system will come to a halt in a year...
I'd never suffered this problem until recently - my first heating system issue has appeared after more than a year's continuous use (and that includes doing lots of experiments on the same system). You may have seen comments in earlier blog entries about this – for the first time ever I recently suffered a dead SD on one of my Raspberry Pi projects – stone, cold dead – read only and NOTHING on the Pi or my PC could encourage the SD to write again.
But let's digress - some of the solutions take time and effort - THIS one is simple.. https://goo.gl/meAAB8 - a DUAL SD adaptor.... if things fail - have someone flick the switch and reboot - right - back to the plot.... (thanks Antonio)...
It has been said that some cheap SDs are not as large as they seem and as soon as you exceed use beyond their ACTUAL size – the chips become read-only. I’ve yet to test this out but TKaiser has suggested testing all new SDs and in a previous comment has recommended SanDisk Extreme Plus.
The test program H2TESTW is widely available for free. I’m testing my first 16GB disk now – looks like it will take 20 minutes but as no user interaction is needed… time well spent.
In here you will find questions and opinions. In the comments hopefully you will find some resolution – lots of bright people read this blog and I’m hoping they have solutions rather than opinions.
If you read on the web about the subject of eMMC and SD and USB memory – it is hard to tell what is hard science and what is opinion.
For example there are blogs suggesting that instead of relying on SD, use a USB memory stick. I have trouble with this as the technology is similar. Why should a USB stick last any longer than an SD.
You’ll see reference to eMMC – there can be no doubt that eMMC (usually an internal module or chip) is usually faster than SD – but does it LAST any longer – some say yes, some say no. To be sure it is less convenient to back up compared to an SD you can simply pull out and replicate!
Then there is the hard disk. I have a natural tendency to think that a spinning disk has to be less reliable than solid state memory but every experience I have says the opposite. I could not tell you the last time a hard disk went bad on me. Of course – they tend to be more expensive – and they are very much larger than SD.
The general idea is that you can READ SD as often as you want but there is a limit sometimes described as 10,000 write cycles, sometimes describes as 10 times that amount. I suspect the latter and that there is just a lot of old information out there.
Then there is WEAR LEVELLING wherein some SDs have a chip inside that helps prevent a single location being written to, too many times – knowledge on this seems to be akin to witchcraft. WHICH manufacturers use this in WHICH SDs and HOW effective is it? I’ve not found a single source of information on the subject that is up to date and verified.
Today I read about putting some directories into RAM.
In the /etc/fstab file you can add for example
tmpfs /var/log tmpfs defaults,noatime,nosuid,mode=0755,size=100m 0 0
Works a treat but for one tiny item – Apache would not start up!
Several people have mentioned RAMLOG – but from what I can see –that no longer works with Jessie (the problem of old material hanging around on the web. This looks modern – and is reasonably straightforward to install – takes just a couple of minutes. https://github.com/azlux/log2ram – I installed it – and it works at treat. The default action is to update the disk every hour - but moving the file “log2ram” from /etc/cron.hourly to /etc/cron.daily to me makes more sense.
So many questions – so many potentially wrong answers. See comments about actual number of writes to SD – would you believe any given location (not the one you see but the REAL location) could be as low as 1000s rather than 10s of thousands – I had no IDEA it was that low).
On the subject of power supplies, in the comments you’ll find code for testing the likes of the Raspberry Pi – as there are registers in the Pi which pick up voltage issues… I was horrified how easy a long USB lead would allow the the Pi to work – but continually to register issues.
In testing – I found comments from TKaiser useful – then when wondering about the CPU frequency I found THIS article – and the associated script useful..
So already we see a need to reduce writes, only use good, tested SDs, use good good supplies with short leads. Not new, not rocket science but I am seeing some good science behind the need for this and look forward to reading more of your educated comments.
Keep the comments coming!
A Little Test
In the process of this discussion, TKaiser supplied us with a little script to return some information about power from the likes of the Pi2 or Pi3. This was intended to be used as a command line tool – repeating until told otherwise. Well, I like REPORTS…
I took out the loop section so as to return a single line of information – and that can conveniently be run in an EXEC node in Node-Red
I changed the script to simplify output – if someone can tell me how to produce output without “’C” and “V” so we have just numbers coming out – would be nice… I called this tk2.sh (changing permissions – don’t forget) and ran that inside an EXEC node in Node-Red…
(If you see a question mark above - the word is sampling)
As you can see we have some space-delimited values! If you look at the bitfield, semi-permanent recordings of issues are on the left (most significant bits) while on-going issues are on the right. Extracting from TKaiser’s notes..
The bits on the right are:
1: arm frequency capped
2: currently throttled
And corresponding on the left:
16: under-voltage has occurred
17: arm frequency capped has occurred
18: throttling has occurred
It is easy enough to break this down..
Here is another version where I have split up the values
The first is the input – the second is the split version – the same except they are now in 4 different places
var reading=msg.payload.split(" ");
msg.payload=reading + " " + reading + " " + reading + " " + reading;
And from there you can do what you like with the data of course – one idea might be to read every minute and turn that string bitfield into an integer, totalling up errors in the lower bits (you could just read bits of the string to achieve the same thing) … after a period send off a report email…
No need to report over-heating as the governor should take care of that – however – min-max summary in the email might be nice while testing.