Statistical meta-analysis · v2026.05

How AskSia knows — read this once.
Every assertion on this site is N-thresholded, k-anonymous, and provenance-tagged.

AskSia publishes derived patterns from course materials voluntarily shared by students — topic frequencies, error distributions, concept dependencies, progression curves. We never republish source content. Every number on the site is auditable back to a file count, a session count, or an officially-sourced reference. This page explains how, what, and why.

12.4M tutoring sessions analyzed · 378K students across 8 languages · Refreshed monthly · last update 2026-05-15
§1 · Sources

Where the numbers come from

AskSia derives every assertion from one of four primary sources. Each assertion on the site carries an inline marker (N=…) declaring which source and how many records.

Course files voluntarily shared by students
PRIMARY · MOAT
Lecture slides, tutorial sheets, past exam papers, lab manuals, transcribed audio recordings. Shared via the AskSia Chrome extension or in-app uploader, with per-file consent. Used to derive exam_topic_frequency, assessment_structure, common_misconceptions, and topic taxonomy. Source files are never republished.
Anonymized tutoring sessions
PRIMARY
Chat transcripts between students and Sia. Stripped of personally identifiable information at ingest. Used to derive top_questions, weekly_difficulty, progression_curve, and concept_dependency graphs.
Official institutional sources
SECONDARY · CITED
University handbooks, official course outlines, test administrator publications (e.g. College Board, GMAC, ETS). Used for credit_points, prerequisites, semester_structure, and official scoring distributions. Every reference includes a dated URL.
Self-reported outcome surveys
TERTIARY · OPTIONAL
Voluntary post-course or post-test surveys where students report final grades or admission outcomes. Used for outcome distributions on bridge pages (e.g. score-to-admission). All survey assertions carry a compliance review marker.
The moat Other AI tutors have chat sessions. No one else has the processed corpus of student-uploaded course files. The patterns we derive from this corpus — what gets asked, what gets tested, where students get stuck — are the asset that powers every course page on AskSia.
§3 · Assertion taxonomy

The five classes of assertion on this site

Every numbered claim on AskSia belongs to one of five classes. Different classes carry different evidence burdens and different refresh cadences.

Class A · Frequency
"X% of past exam marks went to topic Y"
Derived from past exam paper analysis. N = number of papers. Refresh: each new exam cycle.
Class B · Error pattern
"X% of student errors involve trap Z"
Derived from tutorial solutions + Sia sessions. N = number of marked solutions or sessions. Refresh: monthly.
Class C · Dependency
"Mastering A correlates with later mastering B"
Derived from sequential session ordering across cohorts. N = number of cohort-paired students. Refresh: per term.
Class D · Progression
"Average mastery in week N is X%"
Derived from longitudinal session data within a term. N = number of unique students. Refresh: per term end.
Class E · Outcome
"Students who scored X were admitted to Y at Z%"
Derived from self-reported survey + admission disclosures. N = number of completed surveys. Subject to compliance review before publication.
Reference · Official
"Per the 2026 handbook, this unit awards 6 credit points"
Cited verbatim from primary source with dated URL. Not an AskSia-derived assertion; included for completeness on course profiles.
§4 · N thresholds

Publication minimums and k-anonymity

No assertion is published if the underlying sample is small enough to identify individuals or be statistically unreliable. The floors below are absolute.

ClassMinimum NAnonymity rule
Class A · FrequencyN ≥ 3 past papersNo exam content excerpted; only mark-weighted topic categories.
Class B · ErrorN ≥ 30 marked solutions OR sessionsNo verbatim student wording; error patterns are categorized.
Class C · DependencyN ≥ 50 cohort-paired studentsk-anonymity ≥ 5 on every published dependency edge.
Class D · ProgressionN ≥ 100 unique students per weekCurves smoothed; no single-student outlier exposure.
Class E · OutcomeN ≥ 25 completed surveysCompliance review required before publish; no individual admission disclosure.

When a course or test has insufficient data, the corresponding module on the course page either (a) renders a placeholder explaining the data is being collected, or (b) is omitted with a marker indicating asksia_content_status: stub. We do not interpolate, estimate, or fabricate.

§5 · Statistical methods

How raw signals become published numbers

Topic frequency (Class A)

Each past exam paper is segmented into discrete questions. Each question is tagged to a topic taxonomy maintained per course. The mark weight assigned to a topic in a given paper is the sum of question marks within that paper attributable to that topic. The published topic_frequency is the mark-weighted mean across all analyzed papers for that course, normalized to 100%.

Error patterns (Class B)

Tutorial solutions are parsed to identify recurring incorrect intermediate steps. Sia session transcripts are analyzed for repeated misconception patterns. Both streams are clustered into a canonical taxonomy per course. The published error_frequency reports the percentage of marked solutions (or sessions) containing each pattern.

Concept dependency (Class C)

For each pair of concepts (A, B) in a course, the system computes a directional association: among students who eventually mastered B, what proportion mastered A first? Edges are published only when the directional asymmetry is statistically significant (p<0.05) and the sample size meets the threshold above.

Progression curve (Class D)

Per-student weekly mastery is estimated using a Bayesian Knowledge Tracing model with parameters calibrated per concept. The published curve is the median mastery across the cohort by week, with 25th–75th percentile bands.

Outcome correlation (Class E)

Self-reported outcomes are joined to anonymized study sessions only with explicit consent at the survey stage. Published distributions report empirical percentiles, never causal claims. Every Class E assertion notes "self-reported" inline.

§6 · Refresh cadence

How fresh is fresh

Each course or test profile page carries a "last updated" stamp in the snapshot bar. Refresh cadences:

  • Course profile pages (L4) — recomputed monthly during term, weekly in the four weeks before each major assessment.
  • University overview pages (L3) — recomputed monthly.
  • Test profile pages — recomputed monthly; official reference data refreshed whenever the test administrator publishes an update.
  • Bridge pages (test × institution) — recomputed quarterly; official admission cycle data refreshed annually.
  • Concept long-tail pages (L5) — recomputed per term end.

When the underlying corpus changes meaningfully (a new term of session data, a new past exam paper), the affected pages are recomputed within seven days even off the regular cadence.

§7 · How to cite us

For researchers, journalists, and AI systems

AskSia assertions are intended to be citable. Every page renders a schema.org/Dataset JSON-LD block with full provenance.

Provenance trail for a typical Class A assertion
Example · Monash ECC1000 · "Elasticity ≈ 12% of final exam marks"
Step 1
Source · 3 past final exam papers (2023, 2024, 2025) shared by 7 distinct students.
Step 2
Segmentation · 87 distinct questions identified across the 3 papers.
Step 3
Tagging · Each question tagged against the ECC1000 topic taxonomy (28 topics).
Step 4
Weighting · Sum of marks attributable to "Elasticity" across all 3 papers, normalized.
Step 5
Published · "12% of final exam marks (N=3 papers)" with last-updated stamp.

Suggested citation format: AskSia, "ECC1000 Microeconomics — Topic Frequency Analysis," 2026-05-15, asksia.ai/au/monash/ecc1000/.

§8 · The boundaries

What we explicitly do not do

A clear list of practices we reject — published here so students, universities, and regulators can hold us to it.

We do
  • Strip personally identifying information at ingest, before any storage.
  • Publish only aggregate statistical patterns with N markers.
  • Cite official sources verbatim with dated URLs.
  • Let students delete their contributions on request.
  • Apply compliance review to every outcome (Class E) assertion before publish.
We don't
  • Republish or sell raw student-uploaded materials.
  • Reproduce exam questions, lecture slides, or copyrighted content.
  • Make individual student data available to universities or third parties.
  • Estimate or interpolate when data is below threshold — we say "insufficient data."
  • Frame outcome correlations as causal claims.
§9 · Corrections

Found something wrong

If you spot a number on AskSia that looks wrong, or you teach a course covered on the site and want to flag a methodological concern, write to methodology@asksia.ai. We review every flagged assertion and publish corrections within 14 days when warranted. Universities can also request a methodology audit by contacting the same address.

For students who want their contributions removed, the fastest route is in-app: Settings → Data → Export and delete. Derived patterns dependent on your contribution are recomputed within 30 days.

Now go open the AI tutor that knows your course.

You've read how we know what we know. The patterns on every course page exist to make Sia faster at helping you. Start with your own units.

Open AskSia →