Benchmark

Australia 1k

An evaluation set for assessing whether models align with Australian cultural norms, language, history, institutions, and media. It was used to benchmark and assess how aligned SCX MAGPiE is to Australian context.

What it measures.

Australia 1k is designed for localization beyond spelling. It asks whether a model can reason through Australian assumptions, not simply mimic Australian phrasing.

LanguageAustralian English spelling, idioms, and expected tone.

InstitutionsASIC, APRA, Medicare, tax, superannuation, government services, and education context.

CultureAustralian norms, media references, local market dynamics, and civic assumptions.

HistoryQuestions that require Australian historical and public-context reasoning.

Evaluation axis

Generic model risk

MAGPiE target

Legal and regulatory context

US-centric assumptions or broad disclaimers

Australian legal and compliance framing

Language

Surface-level Australian slang or inconsistent spelling

Natural Australian English with domain-appropriate tone

Culture and media

Foreign references or hallucinated local context

Grounded Australian cultural and media awareness

Placeholder

Dataset card and report links go here.

Add Hugging Face, GitHub, paper, methodology, and leaderboard links once public.

Back to research