For coding AI, it could make sense to specialize models on architecture, functional/array split from loopy solutions, or just asking 4 separate small models, and then using a judge model to pick the best parts of each.
For coding AI, it could make sense to specialize models on architecture, functional/array split from loopy solutions, or just asking 4 separate small models, and then using a judge model to pick the best parts of each.
Yep, I’m actually getting pretty comparable results running devstral as an agent considered to a big model like Claude.
You have to specify the problem better, but the tradeoff is I can run it locally.
I’m still in the experiment phase of local models for agents, and they’re accomplishing tasks instead of just coding.
I could see a connection of these agents passing tasks around being highly effective.