byte pair encoding

noun

Definitions

  1. A lossless data compression algorithm that iteratively replaces the most frequent pair of…

    A lossless data compression algorithm that iteratively replaces the most frequent pair of adjacent bytes in a sequence with a new byte not already present in the data.

  2. A subword tokenization method that iteratively merges the most frequent pairs of adjacent…

    A subword tokenization method that iteratively merges the most frequent pairs of adjacent characters in a corpus to form longer and more meaningful tokens, typically until a predefined vocabulary size is reached.

The neighborhood

Vish — recursive loop

No curated loop yet for byte pair encoding. Loops are being traced one word at a time while the ingestion pipeline matures.

sense glosses and etymology drawn from English Wiktionary · source · CC-BY-SA