DeepSeek-V3-4bit runs at >20 tokens per second and <200W using MLX on an M3 Ultra with 512GB. This might be the best and most user-friendly way to run DeepSeek-V3 on consumer hardware, possibly the most affordable too. You can finally run a GPT-4o level model locally, with possibly even better quality. #LLM #AI #ML #DeepSeek #OpenAI #GPT #OpenWeight #OpenSource https://venturebeat.com/ai/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio-and-thats-a-nightmare-for-openai/
@chikim Wo, M3 Ultra with 512GB is considered to be a consumer hardware? LOL I'd like to be one such consumer!