An embedding maps a word or sentence to a point in high-dimensional space, such that meaning ≈ proximity.
"holka" → [0.12, -0.34, 0.87, … ] (768 numbers)
"děvče" → [0.11, -0.31, 0.85, … ] ← nearby
"auto" → [0.92, 0.54, -0.23, … ] ← far away
You can do arithmetic on meaning:
king − man + woman ≈ queen
This is not a trick — it reflects statistical regularities learned from billions of words of text.
Sentence embeddings extend the same idea to whole sentences and paragraphs, enabling semantic search: "find entries that mean something similar to this query."
Origin — Tomáš Mikolov, Brno, 2013
Word2Vec (Mikolov et al., 2013) was the first practical dense word embedding method. Published while Mikolov was at Google Brain; his doctoral research was done in Brno.
Mikolov did his MSc (2007) and PhD (2012) at FIT VUT — Faculty of Information Technology, Brno University of Technology.
From words to sentences:
| Year |
Method |
What changed |
| 2013 |
Word2Vec |
Word-level embeddings |
| 2018 |
BERT |
Context-aware; same word, different vector per context |
| 2019 |
Sentence-BERT |
Full sentence → single vector; enables fast similarity search |
| 2020 |
Multilingual SBERT |
50+ languages in one shared space |
We use Multilingual SBERT (the 2020 step) — that's paraphrase-multilingual-mpnet-base-v2.