Playing with DeepSeek R1 Distill Qwen 1.5B:
π Playing with DeepSeek R1 Distill Qwen 1.5B:
So I tried out DeepSeek R1, the distilled 1.5B small version because resources. lol.Its a tiny yet powerful 1.5B parameter Non quantized model, using Group Relative Policy Optimization (GRPO) for reinforcement learning.
All of DeepSeek’s models are open source, and DeepSeek has been making news lately about how they managed to pull off powerful models using little resources and even skipping steps everyone thought were necessary to develop powerful models. They did all this with the smallest budgets and not-so-powerful GPUs.
And the fact that it’s open source is a whole other thing, because before all this, the only powerful open-source models were Meta’s lineup of LLaMA models. So, having a new player that’s just as powerful, costs a tenth of what industry leaders charge, and is open source is a really big deal.
π Here are some of my takeaways from the hands-on experience:
And remember, this is the Tiny Tiny version, only 1.5B. The larger 7B+ models are far more capable, and DeepSeek has since added extra features, such as website search, improved agent functionality, and the ability to generate structured responses.
So my personal Observations:
-Token length matters
The default max length of 512 tokens wont be enough, the model often runs out of tokens really fast, esp for complex reasoning tasks, it spends most if them in the thinking stages and drafting really long plans. So you have to start with a lot of tokens. Even with 3K tokens, it occasionally struggles to fully articulate long, intricate plans.
-Strengths:
Excels in technical tasks, such as GMS 8K, producing consistent and well-formed outputs like LaTeX.
-Weaknesses:
Agents and Tool Use: Struggles with agent-like tasks, often bypassing tool use and directly producing answers.
This limitation is typical of reasoning models, as they're not really trained for that or tool-based workflows (as noted in the paper).
Creative Writing: Not as strong in creative writing or adopting nuanced personas compared to other models. Performance is okay but lacks the refinement found in models specifically trained for these tasks.
Output Structure: If structured output is essential, this model may not be what you are looking for.
π‘Verdict: Definitely worth checking out esp to understand the current state of open-source language models, despite its limitations in specific use cases.
I've documented everything in a GitHub repository, including examples ranging from prime number detection to analytical writing tasks. Perfect for anyone interested in exploring the practical limits of smaller language models.
Check out the repo below:
AI MachineLearning DeepLearning NLP OpenSource TechInnovation AIResearch DeepSeek
Comments
Post a Comment