Add Simon Willison's Weblog

Gracie Merz 2025-02-09 23:18:52 +08:00
commit 2b785810bb
1 changed files with 42 additions and 0 deletions

@ -0,0 +1,42 @@
<br>That design was trained in part using their unreleased R1 "thinking" design. Today they've launched R1 itself, in addition to a whole household of new [designs](https://design-seoul.com) obtained from that base.<br>
<br>There's an entire lot of things in the new release.<br>
<br>DeepSeek-R1-Zero seems the [base model](https://gitea.oo.co.rs). It's over 650GB in size and, like the majority of their other releases, is under a tidy MIT license. DeepSeek alert that "DeepSeek-R1-Zero comes across challenges such as limitless repetition, poor readability, and language blending." ... so they also launched:<br>
<br>DeepSeek-R1-which "integrates cold-start information before RL" and "attains performance equivalent to OpenAI-o1 across math, code, and reasoning tasks". That a person is also MIT accredited, and is a [comparable size](http://webstories.aajkinews.net).<br>
<br>I don't have the ability to run [designs bigger](https://festicia.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two designs are something I can easily play with myself. That's where the new [distilled](https://www.drillionnet.com) models are available in.<br>
<br>To [support](http://miekeola.com) the research study community, we have [open-sourced](https://xn--939a42kg7dvqi7uo.com) DeepSeek-R1-Zero, DeepSeek-R1, and six [dense models](https://oldtimerfreundebodanrueck.de) [distilled](https://www.vieclam.jp) from DeepSeek-R1 based upon Llama and Qwen.<br>
<br>This is a [fascinating flex](https://erwincaubergh.be)! They have [models based](https://hisheartandhome.org) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
<br>Weirdly those Llama designs have an MIT license connected, which I'm uncertain works with the [underlying Llama](https://www.thehappyconcept.nl) license. Qwen designs are Apache certified so perhaps MIT is OK?<br>
<br>(I also just saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they might require to pay a bit more attention to how they copied those in.)<br>
<br>Licensing aside, these distilled designs are fascinating beasts.<br>
<br>[Running](https://nhumoto.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
<br>Quantized variations are already beginning to appear. So far I've tried just one of those- unsloth/DeepSeek-R 1-Distill-Llama-8B-GGUF released by Unsloth [AI](http://www.fun-net.co.kr)-and it's actually [enjoyable](http://prodius.by) to play with.<br>
<br>I'm [running](https://hgarcia.es) it [utilizing](https://kombiflex.com) the mix of Ollama, LLM and the [llm-ollama plugin](https://isabelleg.fr).<br>
<br>First I [fetched](https://www.schoepamedien.de) and ran the model using Ollama itself:<br>
<br>This [downloads](https://workforceselection.eu) around 8.5 GB of design information and begins an interactive chat user interface.<br>
<br>Once the design has been fetched, LLM can speak to it also. I prefer using LLM for [experiments](https://dubai.risqueteam.com) because it logs whatever to SQLite for later exploration.<br>
<br>I [installed](https://radicaltarot.com) the plugin and ran it like this:<br>
<br>Each action starts with a ... pseudo-XML tag containing the chain of idea used to assist generate the action. For "a joke about a pelican and a walrus who run a tea room together" that started like this:<br>
<br>Okay, so I need to come up with a joke about a pelican and a [walrus running](https://git.cookiestudios.org) a tea room together. Hmm, that's an interesting mix. Let me consider how these 2 characters may engage in an amusing circumstance.<br>
<br>First, let's consider their [attributes](https://chat.app8station.com). Pelicans are [understood](https://naturhome.sk) for their long beaks and Webbed feet, seen near the beach or water. [Walruses](https://amelonline.fr) have big teeth, thick fur, and they're usually discovered in [chillier](http://bridgingthefamilygap.com) climates, like icebergs or snowy areas. So, [integrating](https://cooperativaladormida.com) these 2 into a tea room [setting](https://git.primecode.company) is currently a funny image.<br>
<br> [...]
<br>It then thought for 20 paragraphs before outputting the joke! Here's the complete [records](http://southklad.ru) and here's the output at the end, after the tag:<br>
<br>The Joke:<br>
<br>A pelican and a [walrus choose](http://mumbai.rackons.com) to open a quaint little tea space together. The walrus, with its huge size, [utahsyardsale.com](https://utahsyardsale.com/author/shanawren52/) struggles to [discover comfy](https://megadenta.biz) chairs, so it rests on the table by accident, [overturning](https://nakovali.ru) the teapot. Meanwhile, the pelican, trying to assist, utilizes its beak to put saucers on the table, [causing](https://www.lupitankequipments.com) a couple of spills.<br>
<br>After a series of funny incidents, the walrus looks up and states with a grin, "This isn't so fishy any longer." The pelican smirks and says, "Maybe not, but we do have a lot of krill in our tea!"<br>
<br>Explanation:<br>
<br>- Setup: [Pelican](https://www.martinfurniturestore.com) and Walrus run a not likely tea room. Their sizes result in comedic [turmoil](https://multistyle.work).
- Punchline: "This isn't so fishy any longer" is a pun on the situation ending up being less [unreasonable](https://svetlanama.ru). The follow-up joke introduces "krill," a play on words that ties into both marine life ("fish") and tea, making it [humorous](https://gitlab.syncad.com) while [fitting](https://sennurzorer.com) the theme.<br>
<br>So yeah, the joke is [dreadful](https://workforceselection.eu). But the process of getting there was such a [fascinating insight](https://herobe.com) into how these brand-new designs work.<br>
<br>This was a fairly small 8B design. I'm anticipating trying the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the strength of Llama 3.3 70B-currently my preferred GPT-4 class model that I've worked on my own machine-I have high expectations.<br>
<br>Update 21st January 2025: I got this [quantized](https://linked.aub.edu.lb) version of that Llama 3.3 70B R1 distilled model working like this-a 34GB download:<br>
<br>Can it draw a pelican?<br>
<br>I tried my timeless Generate an SVG of a pelican riding a bike timely too. It did refrain from doing effectively:<br>
<br>It aimed to me like it got the order of the elements incorrect, so I followed up with:<br>
<br>the background wound up [covering](https://cbcnhct.org) the remainder of the image<br>
<br>It thought some more and offered me this:<br>
<br>As with the earlier joke, the chain of believed in the transcript was even more interesting than [completion outcome](http://www.cyberdisty.com).<br>
<br>Other ways to try DeepSeek-R1<br>
<br>If you want to try the design out without installing anything you can do so utilizing chat.deepseek.com-you'll need to produce an account ([indication](https://git.gilesmunn.com) in with Google, use an email address or supply a [Chinese](https://gitea.aabee.ru) +86 [telephone](https://bauen-auf-mallorca.com) number) and after that pick the "DeepThink" option below the [timely input](https://multistyle.work) box.<br>
<br>[DeepSeek](http://oxfordbrewers.org) use the design via their API, utilizing an OpenAI-imitating endpoint. You can access that by means of LLM by dropping this into your extra-openai-models. yaml configuration file:<br>
<br>Then run llm [secrets](http://lain.heavy.jp) set [deepseek](https://uysvisserproductions.co.za) and paste in your API key, then utilize llm -m deepseek-reasoner 'timely' to run [prompts](https://horseridingjohannesburg.co.za).<br>
<br>This won't show you the thinking tokens, unfortunately. Those are dished out by the API (example here) but LLM doesn't yet have a way to show them.<br>