- Joined
- Jul 13, 2006
- Messages
- 2,238
Putting this here now, will update when im off work with details of my workflow (not trying to type that out on my phone, haha).
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
This looks nearly exactly like what I'm looking for except not powershell.New to this LLM stuff as well... got baited in from the Gemma4 talk... (hopefully this isn't hijacking.. but could be useful for someone that is as new as I was.. I know this is most likely elementary for some of [H])
I'm on a Windows workstation; home rig with a 9070 XT. My main use case would be my own PowerShell collaboration partner to assist me with scripting, etc.. as my main job is a Windows system engineer...
So far I've downloaded/installed Ollama, Python (along with uvx) installed Open WebUI (this binds to localhost:8080)
I then launch Open WebUi in an isolated environment (whenever I want to utilize the LLM's via the web ui)
View attachment 807379
Then my AI virgin self learned to pull down the LLM's I need to run the following, which I have so far pulled down:
ollama run gemma4:latest ( this pulls down the E4B size I believe )
ollama run qwen2.5-coder:14b
View attachment 807383
And .. away we go...
View attachment 807385
#!/usr/bin/env bash
set -euo pipefail
# echo "[*] Killing all GPU-using processes…"
# sudo systemctl isolate multi-user.target
# sudo systemctl mask --force display-manager.service
# sudo systemctl mask display-manager
# sudo systemctl stop display-manager
# sudo kill -9 $(nvidia-smi --query-compute-apps=pid --format=csv,noheader) 2>/dev/null || true
# sudo fuser -k -9 /dev/nvidia*
# echo "[*] GPU VRAM should now be fully freed."
# Launch vLLM with Qwen3.6 dense 8‑bit
# --model /workspace/Qwen3.6_dense_8bit \
docker run --gpus all --rm \
-p 8000:8000 \
--ipc=host \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v /home/user/workspace:/workspace \ # Path to your local model
vllm/vllm-openai:cu130-nightly \
--model /workspace/Qwen3.6_dense_hotd_int8_MTP \
--tensor-parallel-size 2 \
--attention-backend FLASHINFER \
--performance-mode interactivity \
--max-model-len auto \
--max-num-batched-tokens 2048 \
--max-num-seqs 1 \
--gpu-memory-utilization 0.93 \
--compilation-config '{"mode":"VLLM_COMPILE","cudagraph_capture_sizes":[3]}' \
-O3 \
--async-scheduling \
--language-model-only \
--tool-call-parser qwen3_coder \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--speculative-config '{"method":"mtp","num_speculative_tokens":2}' \
--default-chat-template-kwargs.preserve_thinking true \
--mamba-cache-mode all \
--mamba-block-size 8 \
--enable-prefix-caching \
--enable-chunked-prefill
#!/usr/bin/env bash
set -euo pipefail
echo "[*] Stopping old container (if running)…"
docker stop open-webui 2>/dev/null || true
echo "[*] Removing old container (if exists)…"
docker rm open-webui 2>/dev/null || true
echo "[*] Launching Open WebUI…"
docker run -d \
-p 3000:8080 \
-v open-webui:/app/backend/data \
-e OPENAI_API_BASE_URL=http://172.17.0.1:8000/v1 \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
echo "[*] Open WebUI is now running at: http://localhost:3000"
To compute **Fourier coefficients** for a continuous-time (CT) signal, you're typically working with the **Continuous-Time Fourier Series (CTFS)**, which applies only to **periodic** signals. If your signal is aperiodic, you'd use the Fourier Transform instead (which yields a continuous spectrum, not discrete coefficients). I'll cover both, but focus on coefficients as requested.
---
### 🔹 1. When to Use Fourier Coefficients
Use Fourier coefficients when:
- The signal \(x(t)\) is **periodic**: \(x(t+T) = x(t)\) for all \(t\).
- It satisfies the **Dirichlet conditions**:
- Absolutely integrable over one period: \(\int_T |x(t)| dt < \infty\)
- Finite number of maxima/minima and discontinuities per period.
---
### 🔹 2. Core Formulas (Complex Exponential Form)
For a periodic CT signal with period \(T\) and fundamental frequency \(\omega_0 = \frac{2\pi}{T}\):
**Synthesis (reconstruction):**
\[
x(t) = \sum_{k=-\infty}^{\infty} a_k e^{j k \omega_0 t}
\]
**Analysis (coefficient extraction):**
\[
a_k = \frac{1}{T} \int_{t_0}^{t_0+T} x(t) e^{-j k \omega_0 t} \, dt
\]
- \(a_k\) are the **Fourier coefficients** (complex numbers).
- \(t_0\) can be any starting point; choose it to simplify the integral (e.g., \(0\) or \(-T/2\)).
- \(a_0\) is the **DC component** (average value over one period).
---
### 🔹 3. Step-by-Step Computation
1. **Find the period \(T\)** and compute \(\omega_0 = 2\pi/T\).
2. **Choose integration limits** covering exactly one period.
3. **Split the integral** if \(x(t)\) is defined piecewise.
4. **Evaluate \(a_0\) separately** (set \(k=0\) or use \(\frac{1}{T}\int_T x(t)dt\)).
5. **Compute \(a_k\) for \(k \neq 0\)** using the analysis formula.
6. **Simplify using symmetries** (saves work and reveals properties):
- Real & even \(x(t) \Rightarrow a_k\) real & even
- Real & odd \(x(t) \Rightarrow a_k\) purely imaginary & odd
- Half-wave symmetry \(\Rightarrow a_k = 0\) for even \(k\)
7. **Check convergence**: At discontinuities, the series converges to the midpoint value. Gibbs phenomenon may appear in partial sums.
---
### 🔹 4. Worked Example
**Signal**: Periodic rectangular pulse
\(x(t) = \begin{cases} 1, & -T/2 < t < 0 \\ 0, & 0 \le t < T/2 \end{cases}\), repeated every \(T\).
- \(\omega_0 = 2\pi/T\)
- Choose limits: \([-T/2, T/2]\)
**DC component**:
\[
a_0 = \frac{1}{T} \int_{-T/2}^{0} 1 \, dt = \frac{1}{2}
\]
**For \(k \neq 0\)**:
\[
a_k = \frac{1}{T} \int_{-T/2}^{0} e^{-j k \omega_0 t} dt
= \frac{1}{T} \left[ \frac{e^{-j k \omega_0 t}}{-j k \omega_0} \right]_{-T/2}^{0}
= \frac{1}{T} \cdot \frac{1 - e^{j k \omega_0 T/2}}{j k \omega_0}
\]
Since \(\omega_0 T = 2\pi\), \(e^{j k \pi} = (-1)^k\):
\[
a_k = \frac{1}{j k 2\pi} \left(1 - (-1)^k\right)
\]
- If \(k\) is even \(\Rightarrow a_k = 0\)
- If \(k\) is odd \(\Rightarrow a_k = \frac{1}{j k \pi}\)
Final:
\[
a_k = \begin{cases}
\frac{1}{2}, & k = 0 \\
0, & k \text{ even}, \, k \neq 0 \\
\frac{1}{j k \pi}, & k \text{ odd}
\end{cases}
\]
---
### 🔹 5. Practical & Numerical Notes
- **Analytical vs Numerical**: Closed-form solutions work for piecewise smooth signals. For arbitrary waveforms, discretize the integral:
\[
a_k \approx \frac{1}{N} \sum_{n=0}^{N-1} x(nT/N) e^{-j k n 2\pi / N}
\]
This is essentially the **DFT/FFT** applied to one period of sampled data.
- **Windowing**: If you only have a finite observation window of a signal, multiply by a window function before computing coefficients to reduce spectral leakage.
- **Trigonometric Form**: If you prefer real coefficients:
\[
x(t) = a_0 + \sum_{k=1}^{\infty} \left( A_k \cos(k\omega_0 t) + B_k \sin(k\omega_0 t) \right)
\]
where \(A_k = a_k + a_{-k}\), \(B_k = j(a_k - a_{-k})\).
---
### 🔹 6. What If the Signal Isn't Periodic?
For aperiodic CT signals, use the **Continuous-Time Fourier Transform (CTFT)**:
\[
X(\omega) = \int_{-\infty}^{\infty} x(t) e^{-j\omega t} dt
\]
This yields a **continuous spectrum** \(X(\omega)\), not discrete coefficients. You can approximate Fourier coefficients by:
1. Truncating the signal to a large window \(T\)
2. Treating it as periodic
3. Computing \(a_k\) as above (introduces spectral leakage/aliasing if not careful)
---
### ✅ Quick Checklist
- [ ] Signal is periodic → use CTFS coefficients
- [ ] Found \(T\) and \(\omega_0\)
- [ ] Chose convenient integration limits
- [ ] Handled \(k=0\) separately
- [ ] Used symmetry to simplify
- [ ] Verified Dirichlet conditions / convergence behavior
- [ ] For numerical work: sampled uniformly over one period, used FFT
Let me know your specific signal or context (analytical vs numerical, real/complex, periodic/aperiodic), and I can tailor the steps or derive coefficients for it.
Read up a bit on VLLM... not sure it's the right option for me. From the sounds of things it's good at multi-GPU and multiple simultaneous requests. Presently I don't have either. Aside from screwing around with image generation in ComfyUI I've just been firing up LM Studio, plugging models into an IDE and using them to write code. Mostly PyCharm + Qwen 3.6 at home. At work it's IntelliJ + Google Gemini. Maybe I would if I started playing with OpenClaw. I'm more likely to get some good out of a low level llama.cpp setup. I have a 5090, 285k and 64GB DDR5-6400 on my main rig and a 3090, i9-10980XE, and 64GB in my old rig. llama.cpp is supposed to be the way to go for partial CPU/GPU offloading. Haven't tried that yet, but it would let me run larger models. Funny thing is my old rig may actually be faster at CPU inference. 18 cores with AVX-512 and quad channel DDR4-3600, so it actually has more memory bandwidth than my current rig. That DDR5-6400 was supposed to be temporary until larger XMP CUDIMMs came out. Still waiting, and now ram is stupid expensive.
I could maybe get the old rig up to 128GB of mismatched ram without buying anything. It's end up running at DDR4-2133 (JEDEC) speeds or add heatsinks to a cheap kit of green PCB no heatsink Crucial and do some manual OC. The Crucial is DDR4-3200 non-XMP. It's the XMP 3600 that ends up at 2133 with XMP off. Not sure if it's worth the hassle. Mostly I'm just trying to figure out how much better a 128GB setup would be.
I've been eyeing an NV DGX Spark or AMD Ryzen AI 395+ machine. I'd consider a Mac but you can't actually order a Studio with more than 96GB right now. Given that all I want it for is running AI stuff the Mac is out. Not enough ram. Might get an Air next time I need a laptop. I looked at various multi-GPU options, but getting to 96-128GB of combined vram and building the rest of a system that can suppor them is difficult without going way over the price of one of those mini-PCs.
Yeah, I haven't personally dove into trying to run mixture of expert (MOE) models mostly because once I started playing with Qwen 3.6 I personally felt model was good enough for my needs and started trying to use it.Read up a bit on VLLM... not sure it's the right option for me. From the sounds of things it's good at multi-GPU and multiple simultaneous requests. Presently I don't have either. Aside from screwing around with image generation in ComfyUI I've just been firing up LM Studio, plugging models into an IDE and using them to write code. Mostly PyCharm + Qwen 3.6 at home. At work it's IntelliJ + Google Gemini. Maybe I would if I started playing with OpenClaw. I'm more likely to get some good out of a low level llama.cpp setup. I have a 5090, 285k and 64GB DDR5-6400 on my main rig and a 3090, i9-10980XE, and 64GB in my old rig. llama.cpp is supposed to be the way to go for partial CPU/GPU offloading. Haven't tried that yet, but it would let me run larger models. Funny thing is my old rig may actually be faster at CPU inference. 18 cores with AVX-512 and quad channel DDR4-3600, so it actually has more memory bandwidth than my current rig. That DDR5-6400 was supposed to be temporary until larger XMP CUDIMMs came out. Still waiting, and now ram is stupid expensive.
I could maybe get the old rig up to 128GB of mismatched ram without buying anything. It's end up running at DDR4-2133 (JEDEC) speeds or add heatsinks to a cheap kit of green PCB no heatsink Crucial and do some manual OC. The Crucial is DDR4-3200 non-XMP. It's the XMP 3600 that ends up at 2133 with XMP off. Not sure if it's worth the hassle. Mostly I'm just trying to figure out how much better a 128GB setup would be.
I've been eyeing an NV DGX Spark or AMD Ryzen AI 395+ machine. I'd consider a Mac but you can't actually order a Studio with more than 96GB right now. Given that all I want it for is running AI stuff the Mac is out. Not enough ram. Might get an Air next time I need a laptop. I looked at various multi-GPU options, but getting to 96-128GB of combined vram and building the rest of a system that can suppor them is difficult without going way over the price of one of those mini-PCs.
A basic thread for a while now, and it used to mean more than it does now. With the advent of Hermes, you can set your goals for your agent and they will work without much if any input for a long time. That said, I am running Qwen 3.5 122B right now, because Qwen 3.6 36B was not getting things done. 3.5 seems to be pretty fast on the coding front, and while I have not run any benchmarks, I know beating on 3.6 for five hours and not getting things done makes 3.5 seem hella fast. Really just depends on what your goal and your timeline is.It really depends on what you want to do with it. The large amount of memory means you can use bigger models, but they sometimes run so slow it's not worth it.
Sorry, stuff came up this weekend. I've got an RX 6800 (non-XT).This is the bash script I run to start the vllm server. Setup shouldn't be too terrible because it is a docker. However, this is an Nvidia docker image, so you would need to tell me what GPUs you're running and we can find a ROCm container that will work for you.
Yeah, that tracks. Ubuntu is effectively the windows of linux distros.I may have to switch to something other than arch...AMD's instructions assume Ubuntu and apt, and the commands don't translate directly to using pacman, and I don't to spend a ton of time figuring that out.
Yeah, I never claimed setup would be super easy, which is why I was sharing my setup to hopefully speed you along.Good Lord, this is crazy. I installed Ubuntu 26.04, because that's what Ubuntu.com offered. AMD's instructions you linked expect 22.04 or 24.04, so I tried the latter, silly me. I got to the "installing kernel drivers" part of the ROCm quickstart, and it sat and spun for a while and started to recompile something large, then bombed out, probably because I have a newer version of the kernel than expected (7.0.0.22 vs .14), I guess. So I suppose I'll just reinstall AGAIN, but do 24.04 this time.
I did get ollama with llama3 installed pretty easily at least, and it was able to create a workable C# Hello World program.
What was Qwen 3.6 messing up on for you?A basic thread for a while now, and it used to mean more than it does now. With the advent of Hermes, you can set your goals for your agent and they will work without much if any input for a long time. That said, I am running Qwen 3.5 122B right now, because Qwen 3.6 36B was not getting things done. 3.5 seems to be pretty fast on the coding front, and while I have not run any benchmarks, I know beating on 3.6 for five hours and not getting things done makes 3.5 seem hella fast. Really just depends on what your goal and your timeline is.
Makes sense. I followed your ROCm link, installed docker, and failed partway through AMD's instructions because something (amdgpu-dkms, I think) wouldn't compile properly. Something about linux-headers-7.0.0.14-generic, when what my kernel was, was .22. I didn't want to spend time figuring out hundreds of lines of compiler errors tonight, not when I hadn't had any chance to do video games yet.Again, I would really try to stick with docker if you can and write launch scripts to keep what works.
It was not messing up. It was simply not a robust enough model for the coding I was doing. It simply could not figure out some of the things I was asking it to do. Had to go a bit "bigger."What was Qwen 3.6 messing up on for you?
That might actually just be configuration or it could be the model. It'll just stop dead in its tracks if it hits the token limit. Max context length is generally something you can set in whatever app you're using (e.g. in ollama), and models also have limits. I can't say for certain that's the issue but that is what will happen if you hit the context length limit.This time it spent a lot of time on a digression about the using statement, including a gratuitous example involving student grades, and then got halfway through the program I requested before, again, dying in the middle of the program.
https://huggingface.co/Qwen/Qwen2.5-Coder-14BI remembered yesterday that I also have a Snapdragon X laptop, so I asked Gemini about using it. It said yeah and suggested a few models. So I installed ollama and starcoder2:7b last night--I mentioned this in the thread on the other subforum. Then I downloaded a couple of Gemma4 models (e4b and a 4-bit quantized 12b).
I tried again with one of the three, and it did a significantly better job, although it did it in a way unlike the way I originally solved the problem, or how ChatGPT did it last year. No biggie, except I noticed I had given it one detail wrong so I asked it to retry "with the assumption the image will be 1bpp instead of 2bpp" and something about color assumptions that isn't really important. It gave me about half the revised program, and just stopped, no errors or diagnostics, midway through a statement. Also, both times it gave me a partial C# class and then forgot to write the separate program that defines the (gui) window layout, so it didn't actually come up with a complete answer, although after reading the whole thing I think it was reasonably close. Also, last time I tried (12-18 months ago, maybe?), it completely botched my instructions that it should handle me dropping files onto the window, which, to be fair, is a bit complicated, and there wasn't much documentation. This time, it got that right and even added support for dropping multiple files, which I never got working. So definite improvement in spite of the ultimate failure.
Today I tried using gemma4:12b to do something I already had success with with Claude 4.5 (I think, or maybe ChatGPT) a month or two ago. Given an XML file that represents a SOAP template (some tags are empty, others have placeholder values) create a GUI program that lets me fill in values, generates a file based on the template, and calls cUrl to submit the file to an API endpoint. This time it spent a lot of time on a digression about the using statement, including a gratuitous example involving student grades, and then got halfway through the program I requested before, again, dying in the middle of the program. I asked it to try again without the using digression and it started repeating the exact same stuff, so I stopped it and went to lunch.
Yeah, headers version needs to match or it won't find them (they are in separate directories in the /usr/src iirc). If you installed a new kernel+headers and haven't rebooted, it may fail because the old headers were removed but you are still running the old kernel.Makes sense. I followed your ROCm link, installed docker, and failed partway through AMD's instructions because something (amdgpu-dkms, I think) wouldn't compile properly. Something about linux-headers-7.0.0.14-generic, when what my kernel was, was .22. I didn't want to spend time figuring out hundreds of lines of compiler errors tonight, not when I hadn't had any chance to do video games yet.![]()
Yeah, I didn't max it out because of that. Then I went down a side passage into an entirely new rabbit hole about trying to use the NPU instead of the CPU (this was on my Snapdragon laptop, not the Ryzen 255 mini pc I started with.)That might actually just be configuration or it could be the model. It'll just stop dead in its tracks if it hits the token limit. Max context length is generally something you can set in whatever app you're using (e.g. in ollama), and models also have limits. I can't say for certain that's the issue but that is what will happen if you hit the context length limit.
You might not want to set the context length to maximum depending on how much memory you have on that machine. Larger contexts use up more memory, and running out of memory can be more annoying than hitting the context length limit. Run out of ram and other random apps crash, or you start swapping, machine gets really slow, and you're chewing through write cycles on your SSD.
I'm willing to try other stuff than ollama. At work I use VS Code and my works' Copilot license, and there's a built-in chat. I'd like to work that way on Linux (or reasonably close to it) but I'm not wedded to that model.There's usually something that logs the token rate, cache hits etc...but not sure for ollama?
continue.dev extension works ish.I'm willing to try other stuff than ollama. At work I use VS Code and my works' Copilot license, and there's a built-in chat. I'd like to work that way on Linux (or reasonably close to it) but I'm not wedded to that model.
I know what the acronym stands for but I have been avoiding learning about them before this. Once work forced my hand, I figured I might as well see if I can get any personal benefit.mcp server
#include <SdFat.h>
#include <Arduino_Get_Firmware_Revision.h>
#; 130
// Configuration for SD Card (Standard for Adafruit M0)
Sd { } ; // Placeholder if using standard SdRoot, but better to use Sd { }
Sd { } ;
// For the M0, we use the standard SdFat library structure
Sd { } ;
using namespace SdFat;
It then proceeds to not bother to actually figure out the actual dimensions. A little disappointing.// Locate dimensions from DIB info (Offset 14 in many systems, but let's
look at standard headers)
uint32_t bytesPerPixel = (bitDepth == 1) ? 1 : 0; // Placeholder for
logic
// In a real scenario, we would parse the header correctly to find
width/height.
int originalWidth = 128; // Example placeholder - should be parsed from
buffer
int originalHeight = 128; // Example placeholder
I got it from lmstudio.ai. That's wher HuggingFace says to get it from: https://huggingface.co/lmstudio-ai Hope that was the right move.Where's the correct place to get LM Studio? When I did a search for it I found a bunch of sites, and I don't want to grab malware from the wrong one.