Bias mitigation in semantic search
Retrieval-augmented search sounds clean: retrieve, then generate from what you retrieved. In practice, both steps are sensitive to what was in the query string. If a user’s wording encodes race, gender, age, religion, or similar dimensions—and the system blindly embeds that string, filters on it, or asks an LLM to paraphrase it—you get two problems at once: outcomes can track attributes you never meant to