Generates a balanced panel with lifecycle earnings, a gender gap, selection on treatment timing, and gendered treatment effects. The DGP is:
Arguments
- n_individuals
Integer. Number of individuals (default 10 000).
- treatment_groups
Integer vector. Treatment groups to include (default
24:28).- seed
Integer or
NULL. RNG seed (default 42). The caller's RNG state is saved and restored on exit, so calling this function does not alter the global random stream. Set toNULLto draw from the current RNG state without reseeding.
Details
$$\log Y_{it} = \mu_0 + \lambda D_i + \alpha_i + \beta_1 (a - 20) + \beta_2 (a - 20)^2 + \gamma \cdot \mathbf{1}[f] + \theta_f \cdot \mathbf{1}[f, a \ge D] + \theta_m \cdot \mathbf{1}[m, a \ge D] + \varepsilon_{it}$$
where \(\alpha_i \sim N(0,\sigma_\alpha^2)\) is a permanent individual effect and \(\varepsilon_{it} \sim N(0,\sigma_\varepsilon^2)\) is a transitory shock. The term \(\lambda D_i\) generates positive selection on treatment timing: individuals who have children later earn more, on average, than those who have children earlier.
Examples
# \donttest{
sim <- simulate_data(n_individuals = 2000)
head(sim)
#> id female age D Y
#> 1 1 1 20 24 40643.68
#> 2 1 1 21 24 36525.46
#> 3 1 1 22 24 39327.60
#> 4 1 1 23 24 34390.98
#> 5 1 1 24 24 46404.27
#> 6 1 1 25 24 46884.98
# }
