alignment tax

noun

Etymology

First attested in a 2019 speech by computer scientist Paul Christiano (see quotation), who attributed the idea to AI researcher and writer Eliezer Yudkowsky.

Definitions

  1. A cost to the capabilities of an artificial intelligence resulting from the effects of…

    A cost to the capabilities of an artificial intelligence resulting from the effects of aligning it with human ethics and morality.

    • The fact that larger models are less subject to forgetting may be related to the fact that larger models do not incur significant alignment taxes.
    • We want an alignment procedure that avoids an alignment tax, because it incentivizes the use of models that are unaligned but more capable on these tasks.

The neighborhood

Vish — recursive loop

No curated loop yet for alignment tax. Loops are being traced one word at a time while the ingestion pipeline matures.

sense glosses and etymology drawn from English Wiktionary · source · CC-BY-SA