Do langauge models “lexicalize” certain multi-token words or phrases
(i.e., treat them as atomic units)? How would we measure lexicalization
in LMs?
Irhum makes an
observation that I have thought about from time to time:
LoRA-adapted model weights decay to the original model during training
(not 0). How much of LoRA’s success could be attributed…
pico.sh seems like a cool, minimal effort tool
for blogging, but it doesn’t support math, so I’m going to roll my
own.
Torch code bug that took me an hour to fix: I wrapped a function in
@torch.inference_mode which called another function which
called another function that was trying to call
torch.backward.
I created an RSS feed for my website. I did it by hand, this template. I
don’t use any frameworks for maintaining my website, I just write
everything by hand. I don’t change my website often…
Suppose you have n langauge
models with embedding size d,
vocabulary size v, and softmax
matrices W1, W2, …, Wn ∈ ℝv × d
and you want to sample from them as an ensemble. One…
I have created a discord server for people interested in
collaborating with the lab. Email me for an invite!
Many LLM APIs give top-k
logprobs in their outputs. What if we want to obtain all the
logprobs? Here I present two algorithms for obtaining logprobs from an
LLM API. Both of these depend on the API allowing us to add a logit bias
to…
I would like to define a differentiable function f : {0, 1}log v → {0, 1}v
that converts binary number representations of log v bits into one-hot vectors.
This can be accomplished by using fuzzy logic operators to…
TL;DR: we can use any intermediate LM representation to
prove that a subset of next-token candidates have non-zero
probability.
Direct sampling from model output distributions often gives
incoherent outputs. Some have attributed this to a heavy tail, i.e., the
model assigns too much probability to low-probability tokens. My goal is
to test this hypothesis.
These are some visualizations I have made over the years, both for
academia and for fun!
ss.py is my personal command line tool for searching and
citing academic papers via the Semantic Scholar API. About page. GitHub.
This ftplugin updates the word count in the statusline on every save.
More frequent updates slow Vim down and cause random rendering
problems.
I finally got Zathura (the pdf viewer) configured the way I want it
on MacOS. I installed using homebrew following these instructions. I
set up an automator script to launch Zathura for me…
I learned some Blender and used some open source elevation data to
make a nice looking relief map of Washington State. Check out how I made
it here. 📍
This last weekend, Caitlyn and I
backpacked up Cottonwood Wash in Utah’s San Rafael Swell, a beautiful
canyon all to ourselves. We even spotted some petroglyphs. ⛰️
These are my notes from the course
CS183: Foundations of Machine Learning. They are imperfect and
incomplete but I really enjoyed making them. If you would like to make
edits…