What Datasets are used for Bias Evaluation? – RChilli HelpDesk

RChilli uses a combination of dataset types, each serving a different purpose in evaluation.

Synthetic datasets

These are controlled testing datasets where only one variable is changed at a time, such as name, graduation year, pronouns, or career gap indicators, while skills, experience, and education remain constant. Synthetic data is useful because it isolates the effect of a specific attribute.

Anonymized real-world data

RChilli also evaluates system behavior using historical or representative resumes where personal identifiers are removed, masked, or excluded. This helps validate fairness in production-like conditions with realistic variation in writing style, layout, industry language, and experience structure.

Domain-specific datasets

Bias and fairness are also assessed across multiple job families, such as IT, healthcare, finance, and administrative roles. This is important because the behavior of matching systems can vary by domain, role type, and qualification pattern.

Using all three dataset categories helps balance controlled testing with real-world realism and industry coverage.

If you need further assistance, feel free to contact the RChilli Support Team by sending an email to support@rchilli.com.