Architecture & Key Concepts

A reference doc for understanding the engineering behind this project.

1. The Core Idea

Polymarket hosts prediction markets, people bet on real-world outcomes (elections, economic events, etc.) and the market price reflects the crowd's estimated probability. For example, a market at $0.70 means the crowd thinks there's a 70% chance of YES.

The hypothesis this system tests: semantically similar markets tend to resolve the same way. If "Will X happen by March?" resolves YES, then "Will X happen by June?" probably will too. The system finds these relationships automatically using embeddings, builds a directed graph, and monitors for signals when one market resolves.

2. Sentence Embeddings (the core technique)

File: src/topic/utils/[embeddings.py](<http://embeddings.py>)

What are embeddings?

An embedding is a fixed-size numerical vector (array of floats) that captures the meaning of text. The model used here is all-mpnet-base-v2 from the sentence-transformers library, which outputs 768-dimensional vectors.

Key properties:

Texts with similar meaning produce vectors that are close together in vector space
Texts with different meaning produce vectors that are far apart
This closeness is measured by cosine similarity (dot product of normalized vectors)

How it works in this project

# Each market has a question + description
texts = [f"{m.question} {m.description[:200]}" for m in markets]

# SentenceTransformer encodes all texts into 768-dim vectors
embeddings = model.encode(texts, normalize_embeddings=True)

The normalize_embeddings=True flag is important, it makes every vector unit length (magnitude = 1.0), which means the dot product of two vectors equals their cosine similarity directly. Without normalization, you'd need to divide by the product of their magnitudes.

# Because vectors are normalized, dot product = cosine similarity
similarity = np.dot(embedding_a, embedding_b)  # range: -1.0 to 1.0

Architecture & Key Concepts

1. The Core Idea

2. Sentence Embeddings (the core technique)

What are embeddings?

How it works in this project

Why not just use keyword matching?