commit 2b785810bb5b5787d1574858ff20acf72597553d Author: graciemerz1126 Date: Sun Feb 9 23:18:52 2025 +0800 Add Simon Willison's Weblog diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..ac8d927 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was trained in part using their unreleased R1 "thinking" design. Today they've launched R1 itself, in addition to a whole household of new [designs](https://design-seoul.com) obtained from that base.
+
There's an entire lot of things in the new release.
+
DeepSeek-R1-Zero seems the [base model](https://gitea.oo.co.rs). It's over 650GB in size and, like the majority of their other releases, is under a tidy MIT license. DeepSeek alert that "DeepSeek-R1-Zero comes across challenges such as limitless repetition, poor readability, and language blending." ... so they also launched:
+
DeepSeek-R1-which "integrates cold-start information before RL" and "attains performance equivalent to OpenAI-o1 across math, code, and reasoning tasks". That a person is also MIT accredited, and is a [comparable size](http://webstories.aajkinews.net).
+
I don't have the ability to run [designs bigger](https://festicia.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two designs are something I can easily play with myself. That's where the new [distilled](https://www.drillionnet.com) models are available in.
+
To [support](http://miekeola.com) the research study community, we have [open-sourced](https://xn--939a42kg7dvqi7uo.com) DeepSeek-R1-Zero, DeepSeek-R1, and six [dense models](https://oldtimerfreundebodanrueck.de) [distilled](https://www.vieclam.jp) from DeepSeek-R1 based upon Llama and Qwen.
+
This is a [fascinating flex](https://erwincaubergh.be)! They have [models based](https://hisheartandhome.org) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
Weirdly those Llama designs have an MIT license connected, which I'm uncertain works with the [underlying Llama](https://www.thehappyconcept.nl) license. Qwen designs are Apache certified so perhaps MIT is OK?
+
(I also just saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they might require to pay a bit more attention to how they copied those in.)
+
Licensing aside, these distilled designs are fascinating beasts.
+
[Running](https://nhumoto.com) DeepSeek-R1-Distill-Llama-8B-GGUF
+
Quantized variations are already beginning to appear. So far I've tried just one of those- unsloth/DeepSeek-R 1-Distill-Llama-8B-GGUF released by Unsloth [AI](http://www.fun-net.co.kr)-and it's actually [enjoyable](http://prodius.by) to play with.
+
I'm [running](https://hgarcia.es) it [utilizing](https://kombiflex.com) the mix of Ollama, LLM and the [llm-ollama plugin](https://isabelleg.fr).
+
First I [fetched](https://www.schoepamedien.de) and ran the model using Ollama itself:
+
This [downloads](https://workforceselection.eu) around 8.5 GB of design information and begins an interactive chat user interface.
+
Once the design has been fetched, LLM can speak to it also. I prefer using LLM for [experiments](https://dubai.risqueteam.com) because it logs whatever to SQLite for later exploration.
+
I [installed](https://radicaltarot.com) the plugin and ran it like this:
+
Each action starts with a ... pseudo-XML tag containing the chain of idea used to assist generate the action. For "a joke about a pelican and a walrus who run a tea room together" that started like this:
+
Okay, so I need to come up with a joke about a pelican and a [walrus running](https://git.cookiestudios.org) a tea room together. Hmm, that's an interesting mix. Let me consider how these 2 characters may engage in an amusing circumstance.
+
First, let's consider their [attributes](https://chat.app8station.com). Pelicans are [understood](https://naturhome.sk) for their long beaks and Webbed feet, seen near the beach or water. [Walruses](https://amelonline.fr) have big teeth, thick fur, and they're usually discovered in [chillier](http://bridgingthefamilygap.com) climates, like icebergs or snowy areas. So, [integrating](https://cooperativaladormida.com) these 2 into a tea room [setting](https://git.primecode.company) is currently a funny image.
+
[...] +
It then thought for 20 paragraphs before outputting the joke! Here's the complete [records](http://southklad.ru) and here's the output at the end, after the tag:
+
The Joke:
+
A pelican and a [walrus choose](http://mumbai.rackons.com) to open a quaint little tea space together. The walrus, with its huge size, [utahsyardsale.com](https://utahsyardsale.com/author/shanawren52/) struggles to [discover comfy](https://megadenta.biz) chairs, so it rests on the table by accident, [overturning](https://nakovali.ru) the teapot. Meanwhile, the pelican, trying to assist, utilizes its beak to put saucers on the table, [causing](https://www.lupitankequipments.com) a couple of spills.
+
After a series of funny incidents, the walrus looks up and states with a grin, "This isn't so fishy any longer." The pelican smirks and says, "Maybe not, but we do have a lot of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](https://www.martinfurniturestore.com) and Walrus run a not likely tea room. Their sizes result in comedic [turmoil](https://multistyle.work). +- Punchline: "This isn't so fishy any longer" is a pun on the situation ending up being less [unreasonable](https://svetlanama.ru). The follow-up joke introduces "krill," a play on words that ties into both marine life ("fish") and tea, making it [humorous](https://gitlab.syncad.com) while [fitting](https://sennurzorer.com) the theme.
+
So yeah, the joke is [dreadful](https://workforceselection.eu). But the process of getting there was such a [fascinating insight](https://herobe.com) into how these brand-new designs work.
+
This was a fairly small 8B design. I'm anticipating trying the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the strength of Llama 3.3 70B-currently my preferred GPT-4 class model that I've worked on my own machine-I have high expectations.
+
Update 21st January 2025: I got this [quantized](https://linked.aub.edu.lb) version of that Llama 3.3 70B R1 distilled model working like this-a 34GB download:
+
Can it draw a pelican?
+
I tried my timeless Generate an SVG of a pelican riding a bike timely too. It did refrain from doing effectively:
+
It aimed to me like it got the order of the elements incorrect, so I followed up with:
+
the background wound up [covering](https://cbcnhct.org) the remainder of the image
+
It thought some more and offered me this:
+
As with the earlier joke, the chain of believed in the transcript was even more interesting than [completion outcome](http://www.cyberdisty.com).
+
Other ways to try DeepSeek-R1
+
If you want to try the design out without installing anything you can do so utilizing chat.deepseek.com-you'll need to produce an account ([indication](https://git.gilesmunn.com) in with Google, use an email address or supply a [Chinese](https://gitea.aabee.ru) +86 [telephone](https://bauen-auf-mallorca.com) number) and after that pick the "DeepThink" option below the [timely input](https://multistyle.work) box.
+
[DeepSeek](http://oxfordbrewers.org) use the design via their API, utilizing an OpenAI-imitating endpoint. You can access that by means of LLM by dropping this into your extra-openai-models. yaml configuration file:
+
Then run llm [secrets](http://lain.heavy.jp) set [deepseek](https://uysvisserproductions.co.za) and paste in your API key, then utilize llm -m deepseek-reasoner 'timely' to run [prompts](https://horseridingjohannesburg.co.za).
+
This won't show you the thinking tokens, unfortunately. Those are dished out by the API (example here) but LLM doesn't yet have a way to show them.
\ No newline at end of file