An open letter on the strategical impacts of the usage of BadMEM

Recently I read a long thread at Slashdot discussing about the psotive and negative impacts on the usage of any kind of BadMEM/BadRAM technology. Skipping several flaming emails I can summarize the 260+ KB of text as a good discussion with the strategical Pros and Cons.

Introduction

Here, I would like to comment on some of the main aspects which I think are mistaken by some writer at Slashdot. Please note that the text below only reflects my personal oppinion on this issue. My intension is NOT to flame at or accuse any author at Slashdot, but to help everybody out there to make up his/her mind on the weightening of the different advantages and disadvantages of BadMEM.

First, I must try to clear a misunderstanding:
Any current usage of BadMEM assumes a static badness of your memory modules. Having randomly appearing holes (especially if the position of these holes are altering by the time), you will encounter severe problems, even if you are using the BadMEM-patch extensively. In the current form, it is an absolute precondition that you must assume a non-dynamic defect of your RAM. Though we can reduce this current discussion to static defective RAM, this issue cannot be skipped.
Many authors at Slashdot mangled this basic. The most used argument against the usage of BadMEM was that you can never really trust a memory module which already has some bad bits that it will stay in this shape. I think that is only half of the truth. Let's take two examples: Imagine a module which got a broken pin during production which was not detected during quality checks by chance; take another module which got broken by a misproduction in the crystaline structure leading to temporary data losses at high temperatures. Which of the modules do you trust more? Can you trust even one of them? Or even further: Can you trust any bought module at all?

Personal decision on usage

First, you must see that there is no common answer. It is a philosophical question which must be answered by everyone - including you. It highly depends on your experiences and your surrounding. If I were an Head Administrator of a big, centralized company, I would never use questionable RAM modules in my productive area (it is likely that I will even never have the time to play around with them in an experimental environment). But if I were a poor student being able to save several dollars to the exchange of some hours of testing time which the PC can normally do by night, things look worthwhile again.
I blame this difference of behaviour, habbits and experience to have made the discussion at Slashdot so emotional. To solve this trap of strategical development for BadMEM, I would like to suggest the following differentiation before continuing:

*Table 1: summerized economical aspects of BadMEM*
Aspect	Comment
Risk of Failure	As already mentioned it above: The risk of a failure will always influence your decision of usage. I would never use suspicious memory in a risky (i.e. medical) environment.
Type of Failure	You always must take care of the known (!) cause of the failure. Bad soldering will not cause headache to me, but crystaline degeneration will cause it at any time.
Quantity of Failure	A module with almost a complete loss of its usable memory will not make it, but a module where only one single chip is provably and statically damaged would do.
Own skills	A newbie on Linux should not use such a thing - at least if (s)he is alone and does not have the necessary support by someone third.
Vendor and manufactor	My trust to a specific manufactor and its vendor will have an impact on the decision, too. If I have the skills to decide on the quality of process and I could see the production process, then my decision will highly depend on these experiences during the process visit.

(Please note that I do not claim that the list above is complete!) As you can see, this discussion leads towards common economical problems which are often solved by the Rational Decision Theory with its measured Weightening and Decision Functions (if you are a German speaking reader, you might want to have a look at "Unternehmenspolitik" by Kieser/Oechsler, Schäffer/Poeschel, 1999, p. 58f and 70f). In my oppinion it does not make sense to explain all the details of this theory here. The only thing I still would like to mention is the fact that making decision is something that makes us all human beings. Therefore I want to make you think about your own decisions - if you are either pro BadMEM or contra it. While you are speaking against it and damning it to hell, another might see a new era dawning. Is only one oppinion correct?

Economical decision

While many people (at Slashdot) are discussing on the level of personal usage, a minority tried to focus the light on some economical problems. I only read one message talking about the logistical problems: As we can deduce from Table 1, it is likely that every single module has its own characteristics. This makes it complicate to compare. Therefore, adequate standards of comparison must be developed. In the same way as of Table 1, here is a suggestion for the dimensions of classification:

*Table 2: classification dimensions*
Dimension	Comment
Quantity of Failure	The amount of bad bits in the module should always go into consideration; in the case for static damages, there are already two types of classification: "BadRAM Class" and "BadMEM Type". For further information, please have a look at the BadMEM-4096 Specification and at Rick's Page
Quality of Failure	Pure statical damages have another basic characteristic as temperature failures and therefore lead to different usabilities.

The dimensions above mainly aim to a single question: 'How can we interpolate the probability of a memory failure?'. Please note in this context that memory might have two (or even more) different types of failures. All of them have to be taken into consideration.
So, standardization might have a certain impact to the mentioned logistical problems. Again, the necessary extend of standardization must be proven by the markets.

Some authors at Slashdot were afraid that money-orientated "business men" could sell BadMEM-PCs as new, 100%-ok PCs, flunking the normal users out there. This economical aspect of the BadMEM-issue simply leads to an philosophical problem: Should you develop something that people might use for the bad? Again, we must ask about the risks and what you can win out of it. In my oppinion it is worthwhile working on BadMEM, as we can take several precautions in the case of misusage:

Short-Term: It is not likely that there will be many "mislead business men" which may make their money out of BadMEM. As I seek for the integration into the standard kernel serie, normal users should not have much further work.
Long-Term: BadMEM should try to influence the memory producing industry to integrate some sort of "bad bit list" on their BadMEM modules. As BadMEM should then have reached integration to the standard Linux kernel, a short test on the memory modules can be done, so that no impact should be visible to the user. To make things harder, BadMEM should display a user-readable warning, that the current PC uses the BadMEM feature to lock several bad areas which were locked by the manufactor of the memory module.

Anyway, this is no complete security, but 'total security' is always an illusion. However, the purchase of BadMEM-PCs will likely have a positive impact to the development of Linux, too: As in the beginning time of this feature, Linux will be the only Operating System which will work on this sort of PCs. In any case I want to make clear that I am no friend of this dubious "business man", because they are not interested in creating wealth, but making money.
At the end you still can mention that the unethnic behaviour of a minority should not stop the majority of making progress.

Another important economical aspect is the question of warranties. The issue begins to become very complex if you not just think of static damages in the RAM, but dynamic damages. As there is no market for bad memory modules, yet, I want to suggest to skip this problem until then.

Outlook

The discussion at Slashdot has shown me a common need: The on-the-fly-feature, namely checking suspicious RAM in the idle-time of the Linux Kernel and dynamically allocating "broken" memory, is a central wish of the upcomming BadMEM community. Therefore, I want to make this dream become reality.

Other resources

http://www2.linuxjournal.com/cgi-bin/frames.pl/lj-issues/issue83/4489.html