A reference doc for understanding the engineering behind this project.
Polymarket hosts prediction markets, people bet on real-world outcomes (elections, economic events, etc.) and the market price reflects the crowd's estimated probability. For example, a market at $0.70 means the crowd thinks there's a 70% chance of YES.
The hypothesis this system tests: semantically similar markets tend to resolve the same way. If "Will X happen by March?" resolves YES, then "Will X happen by June?" probably will too. The system finds these relationships automatically using embeddings, builds a directed graph, and monitors for signals when one market resolves.
File: src/topic/utils/[embeddings.py](<http://embeddings.py>)
An embedding is a fixed-size numerical vector (array of floats) that captures the meaning of text. The model used here is all-mpnet-base-v2 from the sentence-transformers library, which outputs 768-dimensional vectors.
Key properties:
# Each market has a question + description
texts = [f"{m.question} {m.description[:200]}" for m in markets]
# SentenceTransformer encodes all texts into 768-dim vectors
embeddings = model.encode(texts, normalize_embeddings=True)
The normalize_embeddings=True flag is important, it makes every vector unit length (magnitude = 1.0), which means the dot product of two vectors equals their cosine similarity directly. Without normalization, you'd need to divide by the product of their magnitudes.
# Because vectors are normalized, dot product = cosine similarity
similarity = np.dot(embedding_a, embedding_b) # range: -1.0 to 1.0