2025-04-03
pico.sh seems like a cool, minimal effort tool for blogging, but it doesn’t support math, so I’m going to roll my own.
After wrapping up my COLM submission, I want to get back into a project that I left off on a couple months ago.
The idea is that token rankings act as a message authentication code (MAC) for LLM outputs in APIs. If you know the model params, it is easy to produce and verify top-\(k\) ranking that could have come from a model. If you don’t have the model params, I conjecture that it is NP-hard to do these things. Thus, the params function as the secret key, and the top-\(k\) outputs function as the authentication code (known as the “tag”).
The NP hardness result is based on the fact that oriented matroid (OM) realizability is \(\exists\mathbb{R}\)-complete, where \(\mathsf{NP}\leq\exists\mathbb{R}\leq\mathsf{PSPACE}\) is a complexity class. My work so far has shown that if you can learn an unembedding matrix that realizes the set of top-\(k\) a model outputs, then you can solve OM realizability, and vice versa, meaning that stealing the unembedding matrix is \(\exists\mathbb{R}\) complete. This is a nice assurance for API designers, since they know that adversaries will not steal model parameters.
Having established that you cannot learn the embedding matrix from the API, a follow-up question is whether you can learn an approximately correct OM that matches sets of top-\(k\) tokens. To prove the negative, I want to show that you would need to see an exponential number of rankings to eliminate alternative OM candidates, and that the alternative OM candidates are problematic, i.e., misclassify unseen rankings at a high rate. This is hard, since each ranking can probably eliminate an exponential number of candidates at each step. The other approach is to say that the OM representation would need to be exponentially large to be learnable, which would cause issues for the learner.