alignment tax
nounEtymology
First attested in a 2019 speech by computer scientist Paul Christiano (see quotation), who attributed the idea to AI researcher and writer Eliezer Yudkowsky.
Definitions
A cost to the capabilities of an artificial intelligence resulting from the effects of…
A cost to the capabilities of an artificial intelligence resulting from the effects of aligning it with human ethics and morality.
- The fact that larger models are less subject to forgetting may be related to the fact that larger models do not incur significant alignment taxes.
- We want an alignment procedure that avoids an alignment tax, because it incentivizes the use of models that are unaligned but more capable on these tasks.
The neighborhood
Vish — recursive loop
No curated loop yet for alignment tax. Loops are being traced one word at a time while the ingestion pipeline matures.
sense glosses and etymology drawn from English Wiktionary · source · CC-BY-SA