LOOM-GPT — Weaving Small Language Models Together

For a long time I felt like I was using language models without actually understanding them. Modern AI development makes it really easy to connect APIs, send prompts, and build something that looks intelligent, but most of the interesting parts stay hidden underneath. I wanted to understand what training actually looks like, what tokenization changes, why models overfit, and whether small language models could still create interesting behavior without depending on huge pretrained systems.

Instead of reading more papers or watching another explanation video, I decided to build something myself. That project eventually became LOOM-GPT.

## what is LOOM-GPT?

LOOM-GPT is a local AI experimentation framework that allows users to train small GPT-style transformer language models completely from scratch on custom datasets. Users can provide folders containing notes, text files, markdown documents, CSV datasets, JSONL files, or even source code repositories and the framework handles the preparation pipeline automatically.

The system processes datasets, tokenizes text using a byte-level tokenizer, builds training batches, tracks validation performance, stores checkpoints, and finally allows users to generate text from prompts using their trained models.

The goal was never to compete with larger models or replace existing assistants. I wanted something that felt closer to a laboratory than a product — something transparent enough to learn from and flexible enough to experiment with.

## the idea that changed the project

Originally LOOM was only supposed to train one small transformer.

But while building it I kept thinking about something strange.

Why train one model to understand everything when multiple smaller models could specialize independently?

Instead of forcing technical writing, poetry, philosophy, documentation, and personal notes into one training process, what if each model learned one domain properly and then collaborated during generation?

That idea became the feature that eventually defined the project.

## model weaving

Model Weaving allows multiple specialist models to generate together.

Users can train separate specialist models on different datasets and assign influence weights during generation.

Poetry Specialist → 70%
Technology Specialist → 30%

During generation, each specialist predicts the next token independently. LOOM-GPT blends those predictions according to the assigned influence values and samples the final output from the combined probability distribution.

The result becomes a controllable mixture of multiple domains. Generation can shift style naturally depending on which specialist contributes more at different points in the sequence.

I liked this because generation suddenly stopped feeling like one model speaking and started feeling more like multiple systems negotiating what gets written next.

## seeing generation happen

One feature I ended up enjoying more than expected was token-level tracing.

Traditional generation pipelines usually hide decision making. You get output but not much insight into how it appeared.

LOOM records specialist influence at the token level and allows users to inspect which model contributed to each generated word.

Watching specialists dominate different parts of generation made the process feel surprisingly interpretable and much easier to study.

## what surprised me

The biggest surprise was how quickly small models expose mistakes.

Small changes in preprocessing created noticeable changes in generation quality. Repetitive datasets caused validation loss to collapse quickly and overfitting became visible almost immediately.

Another surprise was that blended outputs were often much more coherent than expected. Instead of creating noisy text, specialist mixing sometimes produced outputs with surprisingly strong style and structure.

## why i built this

LOOM-GPT exists mostly because I wanted a better way to understand transformer systems beyond simply using them.

It became a place to experiment with training, generation, architecture choices, and controllable language modeling while staying completely local.

More than anything, it became proof that small models can still be interesting if you stop treating them like miniature versions of larger systems and start letting them specialize.

## what's next

LOOM currently supports local training, specialist generation, and model weaving.

But after finishing the first version, more questions started showing up.

What happens if specialists adapt their weights dynamically? What happens if weaving expands beyond text? What happens if multiple devices participate in generation?

I don't have those answers yet.

But there is already another folder sitting in my workspace.

loom_v2/

We'll see where this goes.

Latest Update: 07/06/2026:LOOM-GPT is now available on PyPI. https://pypi.org/project/loom-gpt/0.1.0/ Train small specialist GPT-style transformers on your own datasets, generate text locally, and experiment with LOOM's signature Model Weaving feature to blend multiple specialist models together. Install with:

pip install loom-gpt==0.1.0 loom --help

LOOM-GPT: Weaving Small Language Models Together as one GPT Felt Lonely