Matthew Finlayson's posts [RSS]

Recent writing, newest first.

Do language models lexicalize phrases?

Do langauge models “lexicalize” certain multi-token words or phrases (i.e., treat them as atomic units)? How would we measure lexicalization in LMs?

LLM have built-in MACs

pico.sh seems like a cool, minimal effort tool for blogging, but it doesn’t support math, so I’m going to roll my own.

Bug in my torch code, random links.

Torch code bug that took me an hour to fix: I wrapped a function in @torch.inference_mode which called another function which called another function that was trying to call torch.backward.

Hello world, MathML in NetNewsWire

I created an RSS feed for my website. I did it by hand, this template. I don’t use any frameworks for maintaining my website, I just write everything by hand. I don’t change my website often…

The "Right Way" to Ensemble Language Models

Suppose you have n langauge models with embedding size d, vocabulary size v, and softmax matrices W1, W2, …, Wn ∈ ℝv × d and you want to sample from them as an ensemble. One…

Research Interest Demo

I have created a discord server for people interested in collaborating with the lab. Email me for an invite!

Obtaining logprobs from an LLM API

Many LLM APIs give top-k logprobs in their outputs. What if we want to obtain all the logprobs? Here I present two algorithms for obtaining logprobs from an LLM API. Both of these depend on the API allowing us to add a logit bias to…

Deep BA Sampling

TL;DR: we can use any intermediate LM representation to prove that a subset of next-token candidates have non-zero probability.

Heavy tails and diversity in model distributions

Direct sampling from model output distributions often gives incoherent outputs. Some have attributed this to a heavy tail, i.e., the model assigns too much probability to low-probability tokens. My goal is to test this hypothesis.

Visualizations

These are some visualizations I have made over the years, both for academia and for fun!

Configuring Zathura

I finally got Zathura (the pdf viewer) configured the way I want it on MacOS. I installed using homebrew following these instructions. I set up an automator script to launch Zathura for me…

Washington State

I learned some Blender and used some open source elevation data to make a nice looking relief map of Washington State. Check out how I made it here. 📍

Camping in Cottonwood Wash

This last weekend, Caitlyn and I backpacked up Cottonwood Wash in Utah’s San Rafael Swell, a beautiful canyon all to ourselves. We even spotted some petroglyphs. ⛰️

Course notes from CS183

These are my notes from the course CS183: Foundations of Machine Learning. They are imperfect and incomplete but I really enjoyed making them. If you would like to make edits…