ECON2515 · Intermediate Applied Econometrics Ii
Foundations of Econometrics
Week 1 fixes the frame the whole of ECON 2515 is judged inside: econometrics is economic measurement — using statistics on real-world, mostly observational data to answer causal questions (does education raise income, does a subsidy change behaviour), not merely to spot patterns. Its four goals are to estimate relationships, test theories, forecast, and evaluate policy, and it is set apart from plain statistics and data science by the causal-vs-predictive dividing line. The chapter also drills the four data structures — cross-section, time series, pooled cross-sections and panel — because the shape of the data dictates the method. The single rule carried into every later topic: a regression coefficient is not causal unless you can defend that nothing else (a confounder, reverse causality, or chance) is driving it.
What this chapter covers
- 011. What econometrics is — statistics on real-world, mostly observational economic data to quantify relationships
- 022. The four goals — estimate a relationship, test a theory, forecast, evaluate a policy or decision
- 033. Causal vs predictive — the dividing question separating econometrics from statistics and data science
- 044. Causal effect and ceteris paribus — how y changes when x changes, holding all else relevant equal
- 055. Regression ≠ causation — four rival explanations: A→B, reverse B→A, a confounder C→both, or chance
- 066. Data structures — cross-section, time series, pooled cross-sections, panel: how each dictates the method
- 077. Panel vs pooled — panel follows the SAME units (removes time-invariant unobservables); pooled uses DIFFERENT units
- 088. Preliminary analysis and tools — plot first (histogram, scatter, bar, time series); the R / RStudio workflow
Classify the data and judge whether a claim is causal
- +3(a) Match each dataset to its structure. (i) is a cross-section — many units (council areas) at one point in time. (ii) is a time series — one unit (the nation) tracked over many ordered periods. (iii) is a panel (longitudinal) dataset — the SAME council areas followed over several periods.
- +2(b) The bike-shop claim is predictive because a lurking confounder can drive both variables together: denser, wealthier, more bike-friendly areas tend to have both more bike shops and more cycling. So the correlation predicts where cycling is high but does not isolate the effect of bike shops holding all else equal (ceteris paribus). Finding a pattern with no identification strategy is closer to data science than to econometrics.
- +1(c) Dataset (iii), the panel, is best for causation: following the same council areas before and after the cycleway opens lets her difference out fixed area traits (density, income, terrain) and get closer to the causal effect of the network itself.
Key terms
- Econometrics
- Economic measurement: using statistical methods on real-world, usually observational, economic data to quantify relationships, test theories, forecast and evaluate policy — with a focus on causal rather than merely predictive questions.
- Observational data
- Data collected by observing the world as it is, without randomly assigning who gets the treatment. Because we cannot randomly assign education, wages or policies, many factors move together, which is why isolating a causal effect is hard.
- Ceteris paribus
- 'All else equal.' The thought experiment that defines a causal effect: how y changes when one x changes while every other relevant factor is held constant. Econometrics recreates it by design (controls, panels, instruments) rather than by lab control.
- Confounder
- A third variable that influences both x and y, creating a spurious association between them. Population density lifting both the number of clinics and the number of GP visits is a classic example.
- Cross-sectional data
- Many units (people, firms, postcodes) observed at a single point in time, usually treated as independent draws from random sampling; watch for non-response and clustering.
- Time series data
- One (or a few) variables tracked over time. Order matters, and the observations are typically trended, seasonal and serially correlated — so they are not independent.
- Panel (longitudinal) data
- The SAME units followed over multiple periods, giving both a cross-sectional and a time dimension. Its advantage is controlling for time-invariant unobservables and studying lagged effects — not simply 'having more data'.
- Pooled cross-sections
- Several independent cross-sections stacked across time, using DIFFERENT units in each period (this year's sample is not last year's). Useful for tracking how a relationship shifts after a policy change; distinct from a panel.
Foundations of Econometrics FAQ
What is the difference between econometrics, statistics and data science?
All three use data, but they answer different questions. Statistics uses a sample to infer about a population; data science optimises prediction and pattern-recognition; econometrics targets the causal question — what happens if we actually change x in the world — usually with data that were only observed, never experimentally assigned. Knowing which job a study is doing is the difference between a defensible answer and a confounded one.
Why does the course keep saying 'regression is not causation'?
A significant coefficient only shows that x and y move together. That association can arise four ways: x genuinely affects y, y affects x (reverse causality), a confounder drives both, or it is chance in a finite sample. In observational data everything else rarely stays equal, so you must defend ceteris paribus — name a confounder or a reverse-causality story — before calling any estimate a cause.
How do I tell panel data from pooled cross-sections?
Both mix a time and a cross-section dimension, but a panel follows the SAME units over the periods, whereas pooled cross-sections use DIFFERENT units each period. Only the panel can difference out time-invariant characteristics of a unit, which is its real analytical advantage — a very common exam trap is to confuse the two.
Do I need to know R for the foundations topic?
Yes. You use R / RStudio throughout the workshops, quizzes and both group assignments to estimate models and produce plots and diagnostics. The final exam is closed-book on paper, but it hands you R output to read, so the examined skill is understanding a printout rather than typing commands.
What maths do I need for Week 1?
The course assumes basic probability (joint, marginal and conditional distributions, independence, the expectation and variance operators, covariance and correlation) and single-variable calculus (power, exp and log rules; the sum, chain, product and quotient rules). These build the OLS estimator and marginal effects; Week 1 revises them, so brush up early if they are rusty.
How is the foundations material examined?
It appears in Part A multiple-choice as 'what kind of study or data is this?' items, and as the interpretation caveats the Part B worked answers demand ('is this coefficient causal?'). It also underpins the group empirical report, where you must justify the data and design you chose. Check your course outline for exact assessment weights.
Exam move
Treat Week 1 as the lens you will look through for the rest of ECON 2515, not a throwaway introduction. For any study you meet — in the lecture, a quiz, or the exam — practise a three-step reflex: name the goal (estimate, test, forecast, or evaluate a decision), name the data structure (cross-section, time series, pooled, or panel), and decide whether the claim is causal or merely predictive. Whenever the data were only observed, force yourself to write down one specific confounder and one reverse-causality story before you accept a coefficient as a cause — that habit is worth reliable marks and stops the single biggest error the examiner fishes for. Finally, get comfortable in R early and always plot before you model, because every later topic (OLS, inference, specification) assumes you can already read a scatter, a histogram and a regression printout without hesitation.