@theruran @psf Worth reading the original paper on defective CPUs. sigops.org/s/conferences/hotos

> A deterministic AES miscomputation, which was “self-inverting”: encrypting and decrypting on the same core yielded the identity function, but decryption else where yielded gibberish.

@niconiconi @psf yeah - finding out that the only computer that can decrypt and access your data is the one with a #mercurialCore that originally encrypted it is 'fun.'
/cc @stman

@stman @theruran @yaaps @niconiconi @psf

This reminds me of a story from one flatmate who repaired machine for a supplier who had a warehouse in North London.

Their work-flow process meant that the desktop machine would arrive in the warehouse to be stored there, before being sent ot the test-lab where my flatmate worked.

They would test the machines, then send the back to a different part of the warehouse before the machines were sent to the customer's offices.

@stman @theruran @yaaps @niconiconi @psf

When the machines were in the offices they would start crashing after 3 days.

So replacements were sent out, and the original machines would be sent back to the warehouse where they were stored until they could be tested again.

All of the tests came out fine, so they were sent back to the warehouse before being sent out to a different set of customers.

Rinse-And-Repeat for several iterations, before they got serious in trying to trace the problems.

@stman @theruran @yaaps @niconiconi @psf

Eventually someone noticed that the machines that were failing had a common element.

They were using +/-10% rated-value resistors.

When they started testing the resistors, they found that ALL of the resistor ratings were either -10% to -5% rated value, or +5% to +10% rated value.

NONE of them were in the centre bands.

@stman @theruran @yaaps @niconiconi @psf

That's when they worked out that the resistor manufacturer had been cherry-picking the resistors from the manufacturing process.

All of the most accurate resistors went into the +/-0.1% product line, the next batch wen into the =/-2% product line, then +/-5%, and the +/-10% product line that my flatmate came across.

He ordered batches directly from a range of resistor suppliers.

ALL of the resistor manufacturers were doing it.

Follow

@stman @theruran @yaaps @niconiconi @psf

If you wanted an accurately-specced resistor, you had to buy the most expensive resistors, otherwise your were just having to guess whether the components would work on the circuit boards.

The reason that the PC's were working in the test lab, but not the customer's offices, is that they didn;t get the chance to warm up enough, so that they would fail, as the warehouse was unheated, but the offices were room temperature.

· · Web · 2 · 0 · 1

@stman @theruran @yaaps @niconiconi @psf

It wouldn't surprise me if the CPU manufacturers were doing the same.

Test the chips and sell the most accurate verrsions at the highest prices, and have a set of band ranges for the rest.

I know that Intel WAS doing this in the early 90's, but changed the way they were doing things after they were sued by some banks that had spent a LOT of money buying the Math-Co-Processors, that failed if you pushed them too far.

@stman @theruran @yaaps @niconiconi @psf

Someone at the CPU manufacturer has fired the staff that knew this failure mode, and there's been a corresponding loss of institutional memory.

The CPU manufacturer has been banding the chips to increase their margins by creating differential product lines.

Someone at the computer manufacturer has been trying to improve their margins by buying the cheaper chips.

Someone at Google/FB has been shaving their costs by buying cheaper machines.

@stman @theruran @yaaps @niconiconi @psf

But this time it's operating at the remote data centre level, rather than the desktop PC level.

Time to benchmark every chip that you buy, and sue the maker if it's not up to spec.

Also time to start shorting the CPU maker's stock, as Google/FB have enough cash to effectively sue without settling. :D

@BillySmith @stman @theruran @yaaps @psf I won't be surprised if I see a comprehensive CPU test suite or online monitoring tool on GitHub by Google or Facebook a few years later.

@BillySmith @stman @theruran @yaaps @psf The resistor tolerance story is a classic in the electronics folklore. Everyone will eventually hear different variations of the same story (or if you are unlucky, has first-hand experience) after getting into electronics for a few years.

@niconiconi @stman @theruran @yaaps @psf

My flatmate showed me the machines that he was working on, as well as showing me the results from the component tests that he performed. :D

He got a pay-raise from that, while the idiot who tried to cut the quality was made redundant.

That whole company was shuttered two years later, as no-one trusted that brand, so stopped buying their machines.

It may be folklore, but i saw it happen. :D

@BillySmith @stman @theruran @yaaps @psf > as no-one trusted that brand, so stopped buying their machines.
There's a saying for this - "worst-case tolerances never add - but when they do, they are found in the best customer's machine." gunkies.org/wiki/Vonada%27s_En

Sign in to participate in the conversation
Mastodon

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!