Aka: How to build a font search engine using Google Fonts, Supabase, and Mistral
Over the last few weeks the most requested feature on brandmint has been for a "font finder".
The use case: you're creating a website or logo and you need a font that captures something abstract like "silent sophistication" or "crisp clean with a little bit of soul". You can click around on Google Fonts, but the search function there seems to expect you to know the names of the fonts. You don't know the names of the fonts, that's why you're searching!
This is a nice search problem. The solution looks something like this:
There are plenty of ways to skin this cat, many of them decades old, so I was a bit surprised to find that Google Fonts hasn't made any effort to solve this. If you're looking for an "exciting" font there, this is what they'll tell you:
They can't find any exciting fonts?!
Don't believe them, there many exciting fonts out there! A great solution would include usage data and a wide catalog, but as an immediate exercise we can use new - or newly cheap - AI services to come up with a good solution and solve this problem.
The most notable part of this is being able to send a picture of some text to a VLM and have it describe the qualities of the font with reasonable success. From there, it's just a question of packaging up the everything into embeddings and using vector search.
Sidenote I: This cookbook will assume you know your way round embeddings and the like. If you're new to vector search check out my guide on how to build your own color search engine which is a bit more beginner-friendly.
Sidenote II: Strictly speaking we're not talking about fonts (e.g. Arial Italic Size 15) or typefaces (e.g. Arial Italic), but families of typeface (e.g. Arial). But the distinction has blurred over time and "fonts" just rolls off the tongue more easily, so I'll use that term throughout this guide.
I did look more widely for a vector-based / semantic search for fonts and couldn't find a live example anywhere online. I found a font search website discussed on Reddit but that site has since gone defunct, so presumably the demand here is pretty light. Still, a few people have asked for it so I've added it to Brandmint.
The finished product, which I called Font Finder, is here:
https://brandmint.ai/font-finder
And here's how I built it, and how you too can build your own!
For this font search engine the source data is from the github repo for Google Fonts, which is licensed under various permissive licenses.
The fonts in this project are organized by license type.
ofl/
– Fonts under the SIL Open Font License 1.1. This is the majority of fonts on Google Fontsapache/
– Fonts under the Apache License 2.0ufl/
– Fonts under the Ubuntu Font License 1.0 (used mainly by the Ubuntu font family)There are lots of other folders that seem promising but are distractions for our simple build. If you're curious:
tags/
- Contains classification tags for fonts. This might well help in classifying our fonts, but I found I didn't need it and omitted it to keep things simple.axisregistry/
- Downstream version of a Google repo which defines deeper variable support for some fonts.lang/
– Downstream version of a Google repo that contains data on languages, scripts, and regions used to classify fonts’ language support on Google Fonts.cc-by-sa/
- Educational material for the Google Fonts website.catalog/
- Background info on specific font designers.Within each of the three important license directories (ofl/
, apache/
, ufl/
), font families are organized by family name. Each family has its own subfolder named after the font family. Folder naming conventions for families generally use lowercase letters and no spaces or special characters (often just removing spaces from the font's name). For example, "Open Sans" is found in ofl/opensans/
.
Each font family folder typically contains:
FontFamily-Style.ttf (the actual font binaries for each style)
FontFamily-Bold.ttf
etcFontFamily[wdth,wght].ttf
with the values in brackets indicating which axis tags are supportedMETADATA.pb (machine-readable metadata)
DESCRIPTION.en_us.html (English description of the font family, typically a paragraph or two about its history and design)
article/ARTICLE.en_us.html
LICENSE.txt (full license text)
There seems to be some variance between folders, but the vast majority of folders I explored had the above structure.
If this is a new project:
Supabase > new project > follow the flow to create a new PostgreSQL
If this your the first table in your database using vectors, you'll need to add the vector extension. In the SQL Editor: choose New SQL Snippet (Execute SQL Queries)
There are two queries to run here:
CREATE SCHEMA IF NOT EXISTS extensions;
CREATE EXTENSION IF NOT EXISTS vector WITH SCHEMA extensions;
Even though you've just added the extensions
schema to support vectors, the fonts
table itself will live in the public
schema.
Supabase does have a UI for creating a new Table but it doesn't let you specify vector size, so you'll need to do this with a SQL command.
For your font data keep it as tight as you can while still providing a good search experience. Sketch out what you want...
And then use an LLM to convert it to SQL:
CREATE TABLE public.fonts (
id BIGSERIAL PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
name TEXT UNIQUE NOT NULL,
url TEXT,
designer TEXT,
year INTEGER,
license TEXT,
copyright TEXT,
category TEXT,
stroke TEXT,
ai_descriptors TEXT[],
description_p1 TEXT,
summary_text_v1 TEXT,
embedding_mistral_v1 VECTOR(1024)
);
Sidenote III: It's a pity we have to treat each typeface family as a single unit. Typically a family will support a number of different "weights". Each weight can be described with a number (e.g. "700") or a word (e.g. "Bold"). The number is standardized in the CSS Fonts specification for font-weight
, it's not a direct measurement of anything. This gives designers precise control over font weight across, and will be a vital part of a font selection in the wild.
If you're looking for the perfect font there will be times when the answer is ExtraLight Arial and some very different times when the answer is ExtraBold Arial. That said, we have to simplify somewhere. If there's user demand for more granular entries they can always be added in a future version.
The Google Fonts repo on GitHub has all the data we need. We're going to start by cloning that repo locally.
git clone https://github.com/google/fonts.git
This will occupy 5.6 GB of storage on your machine.
For this project I'm using Mistral models throughout and coding with TypeScript.
First I create a simple client object to call Mistral:
import { Mistral } from "@mistralai/mistralai";
const apiKey = process.env.MISTRAL_API_KEY;
if (!apiKey) {
throw new Error("MISTRAL_API_KEY is not set");
}
export const clientMistral = new Mistral({ apiKey });
Then a function to call it:
import { clientMistral } from "../connections/clientMistral";
const getEmbeddings = async (inputs: string[]): Promise<string[]> => {
const response = await clientMistral.embeddings.create({
model: "mistral-embed",
inputs: inputs,
});
const outputs = response.data;
const validOutputs = outputs.filter(
(output) => output.object === "embedding" && output.index !== undefined
);
return validOutputs.map((output) => JSON.stringify(output.embedding));
};
Then a super simple script that calls the function:
const testRun = async () => {
const embeddings = await getEmbedding(["red"]);
const testEmbedding = embeddings[0];
console.log(testEmbedding);
console.log(testEmbedding.length);
};
testRun();
This should run fine with Node. Personally I use Bun for simple scripts like this, out of habit.
bun src/script.ts
To connect your code with Supabase find your Project ID (find this in Settings > General > Project Settings
).
It will look something like this: abcdefghijklmnopqrst
Your .env file will need a SUPABASE_URL
, which will be built around the Project ID using this format:
https://abcdefghijklmnopqrst.supabase.co
Your SUPABASE_SERVICE_ROLE_KEY
is a longer string that you'll find in Settings > API Keys > Reveal
.
Here's the code to create your Supabase client on the server:
import { createClient } from "@supabase/supabase-js";
import { Database } from "../types/supabase";
const supabaseUrl = process.env.SUPABASE_URL;
const supabaseKey = process.env.SUPABASE_SERVICE_ROLE_KEY;
if (!supabaseUrl || !supabaseKey) {
throw new Error("SUPABASE_URL or SUPABASE_SERVICE_ROLE_KEY is not set");
}
export const clientSupabase = createClient<Database>(supabaseUrl, supabaseKey);
Supabase gives you an npx script for generating a types file. Here's how to use it:
npm i supabase
npx supabase login
npx supabase gen types typescript --project-id abcdefghijklmnopqrst > types/supabase.ts
Save that script as a comment in your .env file, you will use it a ton!
Our types/supabase.ts
file makes it nice and clear what shape our data needs to be in. So our target state is:
type FontDBReady = {
name: string;
category: string | null;
copyright: string | null;
designer: string | null;
license: string | null;
stroke: string | null;
year: number | null;
url: string | null;
description_p1: string | null;
ai_descriptors: string[];
summary_text_v1: string | null;
embedding_mistral_v1: string;
};
So if we can get our data objects to look like that, we can save them into our database with a super simple upsert(font, { onConflict: "name" })
command without any mapping of parameters.
However, to assemble this data we need to collect info from a few different places. I like to reflect this in the type structure. It's like we're assembling a car, and the car has to pass through a few stations. At the first station we expect a certain set of pieces to be bolted together. That core will then be passed to the next station which will bolt on some extras, and so on until the car is complete.
In our case, there are three stations:
/** Everything here comes from METADATA.pb*/
type FontBasics = {
name: string;
category: string | null; // include classifications here, space separated if multiple
copyright: string | null; // Find this in fonts[0]
designer: string | null;
license: string | null;
stroke: string | null;
year: number | null; // Just use first four digits of date_added
};
type FontEmbeddingReady = FontBasics & {
url: string | null; // Can be constructed from name, separated by +, e.g.: https://fonts.googleapis.com/css2?family=Family+Name
description_p1: string | null; // Take the first <p> element of DESCRIPTION.en_us.html
ai_descriptors: string[]; // List of adjectives from multimodal AI assessment of visual
summary_text_v1: string | null; // Combination of all relevant descriptive elements. This is what we will vectorize.
};
type FontDBReady = FontEmbeddingReady & {
embedding_mistral_v1: string;
};
My full ETL script can be found here:
https://github.com/yablochko8/semantic-search-fonts/blob/main/src/script.ts
I've kept it heavily commented so if you're following this guide with the exact same source data now is the time to dive into that code.
Assembling the FontBasics object is cheap, easy, and all local.
Key parts:
METADATA.pb
file<p>
value I can find in either DESCRIPTION.en_us.html
or inside article/ARTICLE.en_us.html
For my first pass at this I thought I could rely on the text descriptions, but the descriptions sometimes gave offbeat results.
(If you're curious I've left this version live on Brandmint, just select Algorithm: V1)
To get more standardized assessment, our function flow will look like this:
pixtral-12b
)${name} is a ${adjectives} font designed by ${designer} in ${year}. ${p1Description}
I didn't expect to need this step but here's the problem, sometimes the name of a font is hugely useful ("Orbitron" font is nice and futuristic) and sometimes it's hugely unhelpful ("Are you serious" is decidedly not serious, but elevates the joky font to the top of results for a "serious" query). We can let an LLM figure this out.
Good text for embedding should be as descriptive and information-rich as possible, focusing on the essential characteristics of the item you want to represent. We want to avoid negations (e.g., "not serious") as they confuse embedding models. The text should only specify what the font is rather than what it is not. The ideal is a comma-separated list or a short, well-structured paragraph. We can also include the font's name and designer if they're relevant and not misleading.
We pass our embedding-ready text, which I call "summaryTextAdvanced" to our getEmbeddings function, and store the resulting vector in our embedding_mistral_v1
field. Supabase expects
Usually for an ETL we would run some test writes of full data, but given that there are fewer than 2,000 entries it's such a small dataset that it seemed silly to split it out. So at this point we just run the full ETL script,
Because there are fewer than 10,000 entries in this data, we definitely do not need to create an index.
We should get exact results within a few milliseconds. An index would complicate our code and only give us slightly less accurate results.
If you're dealing with a larger dataset, you can see more details on adding an index in my previous guide.
Usually when we want to call a Supabase database from code we use the Supabase SDK, and that will have predefined functions to let you add, delete, update etc.
However, calling for results sorted by embedding distance is beyond the scope of their current SDK, so we'll need to create our own custom function that we can call in a controlled way.
This is called an RPC (Remote Procedure Call) Function.
For our needs, we're going to want to query the embedding column, and get results back with name
, hex
, and is_good_name
fields. We don't need to specify the index we're calling, as there should be only one for that column.
Here's the code for creating the function:
CREATE OR REPLACE FUNCTION search_embedding_mistral_v1(
query_embedding vector(1024),
match_count int default 10
)
RETURNS TABLE (
name text,
category text,
copyright text,
designer text,
license text,
stroke text,
year integer,
url text,
description_p1 text,
distance float
)
LANGUAGE sql STABLE
AS $$
SELECT
c.name,
c.category,
c.copyright,
c.designer,
c.license,
c.stroke,
c.year,
c.url,
c.description_p1,
c.embedding_mistral_v1 <#> query_embedding AS distance
FROM (
SELECT * FROM fonts
ORDER BY embedding_mistral_v1 <#> query_embedding
LIMIT match_count
) c;
$$;
For the curious:
language sql stable
- This tells Postgres that this is a SQL function that will always return the same output for the same inputs, as long as the underlying data hasn't changed. Unlike 'volatile' functions which may return different results even with identical inputs, 'stable' functions are deterministic within a single query. This allows Postgres to optimize the function calls better, since it knows the results will be consistent for the same parameters within a transaction.
We're going to call this named function from our code, so we want Intellisense to expect it. We run the command that we created in Step 5 above.
npx supabase gen types typescript --project-id YOUR_PROJECT_ID > src/types/supabase.ts
Remember we can't query directly using the query string, we need to convert it into an embedding and query with the embedding.
const testQuery = async () => {
const testEmbedding = await getEmbedding("punk eleganza");
const { data, error } = await clientSupabase.rpc(
"search_embedding_mistral_v1",
{
query_embedding: JSON.stringify(testEmbedding),
match_count: 10,
}
);
};
Sure enough, a query for "punk eleganza" gives us the named font "Belleza".
I'm sure I don't need to tell you that "Belleza is a humanist sans serif typeface inspired by the world of fashion. With classic proportions, high contrast in its strokes and special counters, it provides a fresh look that reminds readers of elegant models and feminine beauty."
https://fonts.google.com/specimen/Belleza
Serving up sans serif realness, success!
And of course to if we're looking for something exciting, we now have the entire catalog ranked by exciting-ness:
Integrating with a production service is fairly straightforward, the server code matches the pattern of a testQuery above.
On the frontend, the only novel piece here really is having to load lots of fonts dynamically on the fly.
Here's what that looks like in React:
useEffect
hook when the font.url
loads<link>
element to load the font CSSThe code below shows the implementation:
useEffect(() => {
const loadFont = async () => {
try {
// Create a link element to load the font
const link = document.createElement("link");
link.href = font.url;
link.rel = "stylesheet";
link.type = "text/css";
// Wait for the font to load
link.onload = () => {
setFontLoaded(true);
};
link.onerror = () => {
console.error("Failed to load font:", font.name);
setFontLoaded(true); // Still set to true to show fallback
};
document.head.appendChild(link);
// Cleanup function
return () => {
document.head.removeChild(link);
};
} catch (error) {
console.error("Error loading font:", error);
setFontLoaded(true); // Show fallback
}
};
loadFont();
}, [font.url]);
Then inject the name into the JSX...
<div
style={{
fontFamily: fontLoaded ? font.name : "system-ui, sans-serif",
}}
>
{fontLoaded ? displayText : "Loading font..."}
</div>
This probably took about 20 hours including write up, much longer than it took to build the color search engine last week. A reminder that you can learn the tools but finding your way around a new dataset will always take time.
Cost breakdown:
Total = €1.51
I probably could have indulged in larger models.
If you've scrolled to the end and just want to search for fonts, jump over to the Font Finder on brandmint.ai
I'd love to hear your thoughts on this guide or related topics.