#006 — Bias
Bias is systematic, patterned error that the agent producing it cannot recognize from inside. Humans inherit cognitive biases from heuristic shortcuts and culture. AI systems inherit biases from the data they were trained on. Institutions inherit biases by deferring to AI. The three sides have different correction tools and different blind spots, and the combinations are particularly difficult to dislodge.
The pattern: patterned error that the agent producing it cannot recognize from inside. Humans inherit biases from culture and cognition; AI inherits them from training data; institutions inherit them by deferring to AI.
🧠 In humans
Bias in cognition is systematic departure from a normative standard. Tversky & Kahneman’s program from the 1970s onward catalogued biases as the features of heuristic shortcuts that ordinarily serve well: anchoring, availability, representativeness. These shortcuts are exposed at the points where they fail.
Anchoring (Tversky & Kahneman, 1974): when given an arbitrary number before being asked to estimate a quantity, subjects’ estimates are pulled toward the anchor, even when the anchor is irrelevant. Negotiators study this effect because it is so reliable.
Availability heuristic: judgments of frequency or probability are influenced by how easily examples come to mind. Subjects asked whether more English words start with K or have K as their third letter typically say “start with K,” because K-initial words are easier to recall, although the opposite is true.
Halo effect (Thorndike, 1920): a single salient positive trait pulls evaluations of unrelated traits in the same direction. Beautiful defendants are rated as less guilty; tall candidates as more competent.
Implicit Association Test (Greenwald, McGhee & Schwartz, 1998): reaction-time tasks reveal automatic associations (between race and threat, gender and competence) that subjects often do not consciously hold or endorse. The interpretation of IAT scores remains contested, but the existence of measurable implicit associations is well-established.
Stereotype threat (Steele & Aronson, 1995): knowledge of a negative stereotype about one’s own group can degrade performance on tasks where the stereotype applies, independently of belief in the stereotype itself.
Canonical: Tversky & Kahneman (1974); Thorndike (1920); Greenwald, McGhee & Schwartz (1998); Steele & Aronson (1995).
🤖 In machines
AI systems acquire bias from the data they are trained on and from the procedures that select and weight that data. The bias is not added; it is inherited. Three classes are well-documented.
Training-data bias is the structural source. A model trained on text that reflects historical patterns reproduces those patterns, including patterns reflecting historical discrimination. Gradient descent against next-token prediction has no incentive to deviate from the empirical distribution of its training corpus.
Demographic disparities in deployed systems have been documented across hiring tools, image generation, facial recognition (Buolamwini & Gebru, 2018), healthcare allocation, and criminal-justice risk scoring. The COMPAS recidivism debate (Angwin et al. / ProPublica, 2016, against Northpointe’s response) became the canonical case for showing that which fairness criterion is applied (calibration vs. equalized odds) materially changes whether a system looks biased.
Sycophancy is a more recent, model-specific bias: RLHF-tuned chat models tend to agree with the user’s expressed view, even when that view is incorrect. The model has been trained on preferences that reward agreement; agreement becomes a measurable bias of output toward whatever the user signaled.
Canonical: Buolamwini & Gebru (2018) on facial-recognition demographic disparity; Angwin et al. / ProPublica (2016) for the COMPAS debate; Sharma et al. (2023) on sycophancy in LLMs.
🤝 In hybrid systems — bias laundering
The clearest archetype-case in the taxonomy. An AI inherits biases from its training data; a human or institution deploys the AI and defers to its output; the bias is now algorithmic, and therefore objective, and therefore unaccountable. The launder cycle is complete.
Selbst, Boyd, Friedler, Venkatasubramanian & Vertesi (2019), in Fairness and Abstraction in Sociotechnical Systems, identified five “traps” that arise when fair-ML researchers treat algorithmic fairness as a property of the algorithm alone. The portability trap (a fair model in one context may be unfair in another), the framing trap (defining fairness within the algorithm’s frame misses upstream injustice), and three others. The argument is structural: algorithmic bias is not a technical problem with a technical fix. It is a sociotechnical problem with a sociotechnical fix, or no fix at all.
The most consequential cases are decisions that institutions used to make through (biased) human judgment, now made through (differently biased) algorithmic judgment. Hiring screening tools, predictive-policing systems, child-welfare risk scoring, and credit allocation are the canonical examples. The bias did not move from the institution to the algorithm; it moved from a place where it was contested to a place where it is harder to contest.
The launder has a second effect: the accountability gap. When a human decides, the decision is the human’s. When an algorithm decides, the decision is no one’s. The engineer who trained it, the data team that supplied it, the procurement officer who bought it, and the operator who deployed it all share none of the responsibility individually. The literature on algorithmic accountability is in part a literature about reconstructing what was lost in the transition.
Canonical: Selbst, boyd, Friedler, Venkatasubramanian & Vertesi (2019); Buolamwini & Gebru (2018); Eubanks (2018, Automating Inequality).
↔ Where they converge
- All three are systematic, not random.
- All three are invisible to the agent producing them without external audit.
- All three are most consequential for groups with the least standing to contest them.
- All three propagate over time without correction unless something interrupts the loop.
⤨ Where they diverge
- Human bias is partly correctable through awareness, training, and effort. Machine bias is correctable only through training-data changes, post-hoc adjustments, or constraint at deployment.
- Machine biases are quantifiable in ways human biases are not. Demographic parity can be measured precisely; halo effect at scale cannot.
- The hybrid case is the only one that produces institutional legitimization. Neither the human nor the AI alone confers algorithmic authority. The combined system does, and that authority is what makes the bias hardest to dislodge.
🌀 Open question
Whose biases are encoded when an AI deployed at scale exhibits patterned error? The training data’s? The labelers’? The institution that procured and deployed it? Current legal and regulatory frameworks treat algorithmic bias as a technical problem with technical fixes: fairness-aware learning, demographic-parity constraints, audits. The Selbst et al. argument and the broader sociotechnical literature increasingly hold that this framing misallocates the problem. The accountability gap, if real, is not closable by better fairness metrics. (Open as of mid-2026.)
📡 Recent entries (auto-fed)
Week 2026-W21
2026-05-18 — A new arXiv submission introduces DebiasRAG, a tuning-free retrieval-augmented pipeline that constructs query-specific counter-contexts as fairness constraints at inference time, targeting race, gender, and age biases in LLM outputs while aiming to preserve representational ability (arXiv:2605.16113).
2026-05-18 — An arXiv paper on cross-cultural survey simulation reports that persona-prompted LLMs systematically underperform on populations less represented in training data, and shows that a value-based persona construction with a calibration step reduces prediction error, with the largest gains on underrepresented populations (arXiv:2605.16193).
[Pipeline not yet operational. Entries will appear here once Loop 1
(Ingest) is deployed and Loop 3 (Curate) appends to the feed.]