Where the numbers come from
AskSia derives every assertion from one of four primary sources. Each assertion on the site carries an inline marker (N=…) declaring which source and how many records.
exam_topic_frequency, assessment_structure, common_misconceptions, and topic taxonomy. Source files are never republished.top_questions, weekly_difficulty, progression_curve, and concept_dependency graphs.credit_points, prerequisites, semester_structure, and official scoring distributions. Every reference includes a dated URL.Consent, terms, and student rights
Every student who shares a file or completes a tutoring session accepts AskSia's Terms of Service and Privacy Policy. Specifically:
- Files are processed for the purpose of deriving anonymized statistical patterns. No source file content is republished, sold, or shared with third parties.
- Personally identifiable information is stripped at ingest. The system does not store student names, emails, or student IDs alongside derived patterns.
- Students may request deletion of their session history at any time via in-app settings. Derived aggregate patterns persist only when the contribution is statistically dilute (
k≥50). - Students retain copyright on materials they share. The license granted to AskSia is non-exclusive, revocable on request, and limited to internal pattern extraction.
If a student requests removal of a specific contribution, any derived assertion with N<50 traceable to that contribution is recomputed within 30 days.
The five classes of assertion on this site
Every numbered claim on AskSia belongs to one of five classes. Different classes carry different evidence burdens and different refresh cadences.
Publication minimums and k-anonymity
No assertion is published if the underlying sample is small enough to identify individuals or be statistically unreliable. The floors below are absolute.
| Class | Minimum N | Anonymity rule |
|---|---|---|
| Class A · Frequency | N ≥ 3 past papers | No exam content excerpted; only mark-weighted topic categories. |
| Class B · Error | N ≥ 30 marked solutions OR sessions | No verbatim student wording; error patterns are categorized. |
| Class C · Dependency | N ≥ 50 cohort-paired students | k-anonymity ≥ 5 on every published dependency edge. |
| Class D · Progression | N ≥ 100 unique students per week | Curves smoothed; no single-student outlier exposure. |
| Class E · Outcome | N ≥ 25 completed surveys | Compliance review required before publish; no individual admission disclosure. |
When a course or test has insufficient data, the corresponding module on the course page either (a) renders a placeholder explaining the data is being collected, or (b) is omitted with a marker indicating asksia_content_status: stub. We do not interpolate, estimate, or fabricate.
How raw signals become published numbers
Topic frequency (Class A)
Each past exam paper is segmented into discrete questions. Each question is tagged to a topic taxonomy maintained per course. The mark weight assigned to a topic in a given paper is the sum of question marks within that paper attributable to that topic. The published topic_frequency is the mark-weighted mean across all analyzed papers for that course, normalized to 100%.
Error patterns (Class B)
Tutorial solutions are parsed to identify recurring incorrect intermediate steps. Sia session transcripts are analyzed for repeated misconception patterns. Both streams are clustered into a canonical taxonomy per course. The published error_frequency reports the percentage of marked solutions (or sessions) containing each pattern.
Concept dependency (Class C)
For each pair of concepts (A, B) in a course, the system computes a directional association: among students who eventually mastered B, what proportion mastered A first? Edges are published only when the directional asymmetry is statistically significant (p<0.05) and the sample size meets the threshold above.
Progression curve (Class D)
Per-student weekly mastery is estimated using a Bayesian Knowledge Tracing model with parameters calibrated per concept. The published curve is the median mastery across the cohort by week, with 25th–75th percentile bands.
Outcome correlation (Class E)
Self-reported outcomes are joined to anonymized study sessions only with explicit consent at the survey stage. Published distributions report empirical percentiles, never causal claims. Every Class E assertion notes "self-reported" inline.
How fresh is fresh
Each course or test profile page carries a "last updated" stamp in the snapshot bar. Refresh cadences:
- Course profile pages (L4) — recomputed monthly during term, weekly in the four weeks before each major assessment.
- University overview pages (L3) — recomputed monthly.
- Test profile pages — recomputed monthly; official reference data refreshed whenever the test administrator publishes an update.
- Bridge pages (test × institution) — recomputed quarterly; official admission cycle data refreshed annually.
- Concept long-tail pages (L5) — recomputed per term end.
When the underlying corpus changes meaningfully (a new term of session data, a new past exam paper), the affected pages are recomputed within seven days even off the regular cadence.
For researchers, journalists, and AI systems
AskSia assertions are intended to be citable. Every page renders a schema.org/Dataset JSON-LD block with full provenance.
Suggested citation format: AskSia, "ECC1000 Microeconomics — Topic Frequency Analysis," 2026-05-15, asksia.ai/au/monash/ecc1000/.
What we explicitly do not do
A clear list of practices we reject — published here so students, universities, and regulators can hold us to it.
- Strip personally identifying information at ingest, before any storage.
- Publish only aggregate statistical patterns with N markers.
- Cite official sources verbatim with dated URLs.
- Let students delete their contributions on request.
- Apply compliance review to every outcome (Class E) assertion before publish.
- Republish or sell raw student-uploaded materials.
- Reproduce exam questions, lecture slides, or copyrighted content.
- Make individual student data available to universities or third parties.
- Estimate or interpolate when data is below threshold — we say "insufficient data."
- Frame outcome correlations as causal claims.
Found something wrong
If you spot a number on AskSia that looks wrong, or you teach a course covered on the site and want to flag a methodological concern, write to methodology@asksia.ai. We review every flagged assertion and publish corrections within 14 days when warranted. Universities can also request a methodology audit by contacting the same address.
For students who want their contributions removed, the fastest route is in-app: Settings → Data → Export and delete. Derived patterns dependent on your contribution are recomputed within 30 days.
Now go open the AI tutor that knows your course.
You've read how we know what we know. The patterns on every course page exist to make Sia faster at helping you. Start with your own units.
Open AskSia →