Total noob to this space, correct me if I’m wrong. I’m looking at getting new hardware for inference and I’m open to AMD, NVIDIA or even Apple Silicon.

It feels like consumer hardware comparatively gives you more value generating images than trying to run chatbots. Like, the models you can run at home are just dumb to talk to. But they can generate images of comparable quality to online services if you’re willing to wait a bit longer.

Like, GPT OSS 120b, assuming you can spare 80GB of memory, is still not GPT 5. But Flux Shnell is still Flux Shnel, right? So if diffusion is the thing, NVIDIA wins right now.

Other options might even be better for other uses, but chatbots are comparatively hard to justify. Maybe for more specific cases like code completion with zero latency or building a voice assistant, I guess.

Am I too off the mark?

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    15 days ago

    It feels like consumer hardware comparatively gives you more value generating images than trying to run chatbots

    While I personally get more use out of the hardware that way, you’re also posting this to an LLM community. You’re probably going to get people who do use LLMs more.

    I also don’t think that “Image diffusion models small, LLM models large” is likely some sort of constant — I’m sure that the image generation people can make use of larger models — and the hardware is going to be a moving target. Those Framework Desktop machines have up to 128GB of unified memory, for example.

    • rkd@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      15 days ago

      That’s a good point, but it seems that there are several ways to make models fit in smaller memory hardware. But there aren’t many options to compensate for not having the ML data types that allows NVIDIA to be like 8x faster sometimes.