mastodon.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
The original server operated by the Mastodon gGmbH non-profit

Administered by:

Server stats:

360K
active users

calicoding

Ok I wasn’t sure at first, but it seems like the performance issue I’m facing is partially due to ref counting. It’s not free!

I’m working in a system I didn’t design, that relies heavily on reference types and inheritance. Given the chance, I probably would have designed it differently, leaning more on value types.

LOL so yes ref counting was a bottle neck, but only because I’m an idiot and accidentally called my heavy quadratic calculation hundreds of times more than I needed to 🤦‍♂️

I wish this was easier to see in the profiler. But also: check your assumptions!

Accidentally quadratic? Pffft that’s rookie numbers. This was accidentally quartic

@calicoding I saw your first post and was like “hmmmmm”, but this makes more sense

@mattiem @calicoding Ref counting actually is expensive because it is thread safe, it’s the global interpreter lock of Swift.
(and it doesn’t just affect user types, all the cow types use it)

@mattiem @calicoding Something requiring a lock of some sorts, resulting in cross core cache flushes and such (I don’t know the actual effects, probably depends a lot on the platform)

@helge @mattiem @calicoding on Apple Silicon™️, the expensiveness is at least only if you have actual contention accessing the refcount

@joe @mattiem @calicoding Even if there is some atomic instruction doing the thing, the cores would still have to synchronize ie flush their pipelines, no? I don’t know much about such low levels and some info why it isn’t expensive would be welcome 🙃
Or how expensive compared to a simple rc++ increment.
My assumption is that RC is massively more expensive, is that wrong?

@helge @mattiem @calicoding the particular instructions used for refcounting get speculatively executed as if they were nonatomic, so in the case where there's no contention, there's very little overhead (because the atomic codegen still needs more instructions than a nonatomic update, etc.) if it turns out later the memory location was contended then you throw all that work away and do it properly

@joe @helge @mattiem in my case, everything was run on the main thread, so I don’t think there was any contention.

Profiling in release mode made it harder to interpret the results in Time Profiler instrument. In this case it actually helped to profile in debug. Less crazy optimizations. But wow array indexing in debug mode seems like it has a ton going on

@calicoding @helge @mattiem yeah debug profiling probably isn't very representative of optimized codegen, because so much is left explicit from the standard library implementation

@joe @helge @calicoding even CPU instructions don’t want to work anymore this is getting ridiculous

@mattiem @joe @calicoding There is a reason why „lazy“ is a keyword in Swift

@mattiem already working from home what more do they want

@helge @mattiem @calicoding a good chunk (most?) of the remaining overhead in swift_retain/release is in the call and dyld thunk, to the point we've been contemplating horrible ways to avoid that without giving up ABI flexibility for the object header