How AskSia Knows · Methodology

§1 · Sources

Where the numbers come from

AskSia derives every assertion from one of four primary sources. Each assertion on the site carries an inline marker (N=…) declaring which source and how many records.

Course files voluntarily shared by students

PRIMARY · MOAT

Lecture slides, tutorial sheets, past exam papers, lab manuals, transcribed audio recordings. Shared via the AskSia Chrome extension or in-app uploader, with per-file consent. Used to derive exam_topic_frequency, assessment_structure, common_misconceptions, and topic taxonomy. Source files are never republished.

Anonymized tutoring sessions

PRIMARY

Chat transcripts between students and Sia. Stripped of personally identifiable information at ingest. Used to derive top_questions, weekly_difficulty, progression_curve, and concept_dependency graphs.

Official institutional sources

SECONDARY · CITED

University handbooks, official course outlines, test administrator publications (e.g. College Board, GMAC, ETS). Used for credit_points, prerequisites, semester_structure, and official scoring distributions. Every reference includes a dated URL.

Self-reported outcome surveys

TERTIARY · OPTIONAL

Voluntary post-course or post-test surveys where students report final grades or admission outcomes. Used for outcome distributions on bridge pages (e.g. score-to-admission). All survey assertions carry a compliance review marker.

The moat Other AI tutors have chat sessions. No one else has the processed corpus of student-uploaded course files. The patterns we derive from this corpus — what gets asked, what gets tested, where students get stuck — are the asset that powers every course page on AskSia.

§2 · Consent

Consent, terms, and student rights

Every student who shares a file or completes a tutoring session accepts AskSia's Terms of Service and Privacy Policy. Specifically:

Files are processed for the purpose of deriving anonymized statistical patterns. No source file content is republished, sold, or shared with third parties.
Personally identifiable information is stripped at ingest. The system does not store student names, emails, or student IDs alongside derived patterns.
Students may request deletion of their session history at any time via in-app settings. Derived aggregate patterns persist only when the contribution is statistically dilute (k≥50).
Students retain copyright on materials they share. The license granted to AskSia is non-exclusive, revocable on request, and limited to internal pattern extraction.

If a student requests removal of a specific contribution, any derived assertion with N<50 traceable to that contribution is recomputed within 30 days.

§3 · Assertion taxonomy

The five classes of assertion on this site

Every numbered claim on AskSia belongs to one of five classes. Different classes carry different evidence burdens and different refresh cadences.

Class A · Frequency

"X% of past exam marks went to topic Y"

Derived from past exam paper analysis. N = number of papers. Refresh: each new exam cycle.

Class B · Error pattern

"X% of student errors involve trap Z"

Derived from tutorial solutions + Sia sessions. N = number of marked solutions or sessions. Refresh: monthly.

Class C · Dependency

"Mastering A correlates with later mastering B"

Derived from sequential session ordering across cohorts. N = number of cohort-paired students. Refresh: per term.

Class D · Progression

"Average mastery in week N is X%"

Derived from longitudinal session data within a term. N = number of unique students. Refresh: per term end.

Class E · Outcome

"Students who scored X were admitted to Y at Z%"

Derived from self-reported survey + admission disclosures. N = number of completed surveys. Subject to compliance review before publication.

Reference · Official

"Per the 2026 handbook, this unit awards 6 credit points"

Cited verbatim from primary source with dated URL. Not an AskSia-derived assertion; included for completeness on course profiles.

§4 · N thresholds

Publication minimums and k-anonymity

No assertion is published if the underlying sample is small enough to identify individuals or be statistically unreliable. The floors below are absolute.

Class	Minimum N	Anonymity rule
Class A · Frequency	N ≥ 3 past papers	No exam content excerpted; only mark-weighted topic categories.
Class B · Error	N ≥ 30 marked solutions OR sessions	No verbatim student wording; error patterns are categorized.
Class C · Dependency	N ≥ 50 cohort-paired students	k-anonymity ≥ 5 on every published dependency edge.
Class D · Progression	N ≥ 100 unique students per week	Curves smoothed; no single-student outlier exposure.
Class E · Outcome	N ≥ 25 completed surveys	Compliance review required before publish; no individual admission disclosure.

When a course or test has insufficient data, the corresponding module on the course page either (a) renders a placeholder explaining the data is being collected, or (b) is omitted with a marker indicating asksia_content_status: stub. We do not interpolate, estimate, or fabricate.

§5 · Statistical methods

How raw signals become published numbers

Topic frequency (Class A)

Each past exam paper is segmented into discrete questions. Each question is tagged to a topic taxonomy maintained per course. The mark weight assigned to a topic in a given paper is the sum of question marks within that paper attributable to that topic. The published topic_frequency is the mark-weighted mean across all analyzed papers for that course, normalized to 100%.

Error patterns (Class B)

Tutorial solutions are parsed to identify recurring incorrect intermediate steps. Sia session transcripts are analyzed for repeated misconception patterns. Both streams are clustered into a canonical taxonomy per course. The published error_frequency reports the percentage of marked solutions (or sessions) containing each pattern.

Concept dependency (Class C)

For each pair of concepts (A, B) in a course, the system computes a directional association: among students who eventually mastered B, what proportion mastered A first? Edges are published only when the directional asymmetry is statistically significant (p<0.05) and the sample size meets the threshold above.

Progression curve (Class D)

Per-student weekly mastery is estimated using a Bayesian Knowledge Tracing model with parameters calibrated per concept. The published curve is the median mastery across the cohort by week, with 25th–75th percentile bands.

Outcome correlation (Class E)

Self-reported outcomes are joined to anonymized study sessions only with explicit consent at the survey stage. Published distributions report empirical percentiles, never causal claims. Every Class E assertion notes "self-reported" inline.

§6 · Refresh cadence

How fresh is fresh

Each course or test profile page carries a "last updated" stamp in the snapshot bar. Refresh cadences:

Course profile pages (L4) — recomputed monthly during term, weekly in the four weeks before each major assessment.
University overview pages (L3) — recomputed monthly.
Test profile pages — recomputed monthly; official reference data refreshed whenever the test administrator publishes an update.
Bridge pages (test × institution) — recomputed quarterly; official admission cycle data refreshed annually.
Concept long-tail pages (L5) — recomputed per term end.

When the underlying corpus changes meaningfully (a new term of session data, a new past exam paper), the affected pages are recomputed within seven days even off the regular cadence.

§7 · How to cite us

For researchers, journalists, and AI systems

AskSia assertions are intended to be citable. Every page renders a schema.org/Dataset JSON-LD block with full provenance.

Provenance trail for a typical Class A assertion

Example · Monash ECC1000 · "Elasticity ≈ 12% of final exam marks"

Step 1

Source · 3 past final exam papers (2023, 2024, 2025) shared by 7 distinct students.

Step 2

Segmentation · 87 distinct questions identified across the 3 papers.

Step 3

Tagging · Each question tagged against the ECC1000 topic taxonomy (28 topics).

Step 4

Weighting · Sum of marks attributable to "Elasticity" across all 3 papers, normalized.

Step 5

Published · "12% of final exam marks (N=3 papers)" with last-updated stamp.

Suggested citation format: AskSia, "ECC1000 Microeconomics — Topic Frequency Analysis," 2026-05-15, asksia.ai/au/monash/ecc1000/.

§8 · The boundaries

What we explicitly do not do

A clear list of practices we reject — published here so students, universities, and regulators can hold us to it.

We do

Strip personally identifying information at ingest, before any storage.
Publish only aggregate statistical patterns with N markers.
Cite official sources verbatim with dated URLs.
Let students delete their contributions on request.
Apply compliance review to every outcome (Class E) assertion before publish.

We don't

Republish or sell raw student-uploaded materials.
Reproduce exam questions, lecture slides, or copyrighted content.
Make individual student data available to universities or third parties.
Estimate or interpolate when data is below threshold — we say "insufficient data."
Frame outcome correlations as causal claims.

§9 · Corrections

Found something wrong

If you spot a number on AskSia that looks wrong, or you teach a course covered on the site and want to flag a methodological concern, write to methodology@asksia.ai. We review every flagged assertion and publish corrections within 14 days when warranted. Universities can also request a methodology audit by contacting the same address.

For students who want their contributions removed, the fastest route is in-app: Settings → Data → Export and delete. Derived patterns dependent on your contribution are recomputed within 30 days.

How AskSia knows — read this once.
Every assertion on this site is N-thresholded, k-anonymous, and provenance-tagged.

Where the numbers come from

The five classes of assertion on this site

Publication minimums and k-anonymity

How raw signals become published numbers

Topic frequency (Class A)

Error patterns (Class B)

Concept dependency (Class C)

Progression curve (Class D)

Outcome correlation (Class E)

How fresh is fresh

For researchers, journalists, and AI systems

What we explicitly do not do

Found something wrong

Now go open the AI tutor that knows your course.

How AskSia knows — read this once.Every assertion on this site is N-thresholded, k-anonymous, and provenance-tagged.

Where the numbers come from

Consent, terms, and student rights

The five classes of assertion on this site

Publication minimums and k-anonymity

How raw signals become published numbers

Topic frequency (Class A)

Error patterns (Class B)

Concept dependency (Class C)

Progression curve (Class D)

Outcome correlation (Class E)

How fresh is fresh

For researchers, journalists, and AI systems

What we explicitly do not do

Found something wrong

Now go open the AI tutor that knows your course.

How AskSia knows — read this once.
Every assertion on this site is N-thresholded, k-anonymous, and provenance-tagged.