Benchmark

Australia 1k

An evaluation set for assessing whether models align with Australian cultural norms, language, history, institutions, and media. It was used to benchmark and assess how aligned SCX MAGPiE is to Australian context.

What it measures.

Australia 1k is designed for localization beyond spelling. It asks whether a model can reason through Australian assumptions, not simply mimic Australian phrasing.

LanguageAustralian English spelling, idioms, and expected tone.
InstitutionsASIC, APRA, Medicare, tax, superannuation, government services, and education context.
CultureAustralian norms, media references, local market dynamics, and civic assumptions.
HistoryQuestions that require Australian historical and public-context reasoning.
Evaluation axis
Generic model risk
MAGPiE target
Legal and regulatory context
US-centric assumptions or broad disclaimers
Australian legal and compliance framing
Language
Surface-level Australian slang or inconsistent spelling
Natural Australian English with domain-appropriate tone
Culture and media
Foreign references or hallucinated local context
Grounded Australian cultural and media awareness

Placeholder

Dataset card and report links go here.

Add Hugging Face, GitHub, paper, methodology, and leaderboard links once public.

Back to research