@theruran @psf Worth reading the original paper on defective CPUs. https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s01-hochschild.pdf
> A deterministic AES miscomputation, which was “self-inverting”: encrypting and decrypting on the same core yielded the identity function, but decryption else where yielded gibberish.
This reminds me of a story from one flatmate who repaired machine for a supplier who had a warehouse in North London.
Their work-flow process meant that the desktop machine would arrive in the warehouse to be stored there, before being sent ot the test-lab where my flatmate worked.
They would test the machines, then send the back to a different part of the warehouse before the machines were sent to the customer's offices.
When the machines were in the offices they would start crashing after 3 days.
So replacements were sent out, and the original machines would be sent back to the warehouse where they were stored until they could be tested again.
All of the tests came out fine, so they were sent back to the warehouse before being sent out to a different set of customers.
Rinse-And-Repeat for several iterations, before they got serious in trying to trace the problems.
Eventually someone noticed that the machines that were failing had a common element.
They were using +/-10% rated-value resistors.
When they started testing the resistors, they found that ALL of the resistor ratings were either -10% to -5% rated value, or +5% to +10% rated value.
NONE of them were in the centre bands.
If you wanted an accurately-specced resistor, you had to buy the most expensive resistors, otherwise your were just having to guess whether the components would work on the circuit boards.
The reason that the PC's were working in the test lab, but not the customer's offices, is that they didn;t get the chance to warm up enough, so that they would fail, as the warehouse was unheated, but the offices were room temperature.
It wouldn't surprise me if the CPU manufacturers were doing the same.
Test the chips and sell the most accurate verrsions at the highest prices, and have a set of band ranges for the rest.
I know that Intel WAS doing this in the early 90's, but changed the way they were doing things after they were sued by some banks that had spent a LOT of money buying the Math-Co-Processors, that failed if you pushed them too far.
Someone at the CPU manufacturer has fired the staff that knew this failure mode, and there's been a corresponding loss of institutional memory.
The CPU manufacturer has been banding the chips to increase their margins by creating differential product lines.
Someone at the computer manufacturer has been trying to improve their margins by buying the cheaper chips.
Someone at Google/FB has been shaving their costs by buying cheaper machines.
But this time it's operating at the remote data centre level, rather than the desktop PC level.
Time to benchmark every chip that you buy, and sue the maker if it's not up to spec.
Also time to start shorting the CPU maker's stock, as Google/FB have enough cash to effectively sue without settling. :D
My flatmate showed me the machines that he was working on, as well as showing me the results from the component tests that he performed. :D
He got a pay-raise from that, while the idiot who tried to cut the quality was made redundant.
That whole company was shuttered two years later, as no-one trusted that brand, so stopped buying their machines.
It may be folklore, but i saw it happen. :D
Server run by the main developers of the project It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!