************
* Tutorial *
* ExampleA *
************

*** Single C-Variables ***

** Mean Value analysis (Baseline) **

* Loading the data
clear
use "C:\Users\fbizz\Dropbox\Dissertation\Data\vdem_cy_v5.dta"

* Setting up the dataset
xtset country_id year

* Running the analysis with mean value
xtreg F.e_peinfmor v2cldiscw e_migdppcln , fe vce(cluster country_id)

***************************************
*** Incorporating Measurement Error ***
***************************************

*** Generating Posterior File in Stata ***

*** Setting up Stata ***
*Clears Data and Existing Matrices
clear
matrix drop _all

*Changes Working Directory
cd "C:\Users\fbizz\Dropbox\ShamrockSeries\Fernando\tutorial_CIs" /*INSERT THE DIRECTORY WHERE YOU EXTRACTED THE DATA FILES*/

*** Loading and preparing the data ***
* load posteriors file
import delimited "v2cldiscw.10000.z.sample.csv", clear

* generate year and country_text_id
gen country_text_id = substr(v1,1,3)
gen year = substr(v1,5,4)
destring year, replace

* when there are multiple observations for a particular country-year, keep the oldest observation (i.e. toward the end of the year rather than the beginning)
gen obs_sort =_n
gsort -obs_sort
duplicates drop country_text_id year, force
sort obs_sort
drop v1 obs_sort

* add country_id and the point-estimate from V-Dem Country-Year dataset
merge 1:1 country_text_id year using "C:\Users\fbizz\Dropbox\Dissertation\Data\vdem_cy_v5.dta", keepusing(country_id v2cldiscw) nogenerate

* Rename the mean point-estimate
rename v2cldiscw cldiscw

* carry forward and rename variables
xtset country_id year
foreach var of varlist v2 - v901{
qui bysort country_id: carryforward `var', replace
qui replace `var' = . if cldiscw== .
qui rename `var' cldiscw_`var'
}

* reorder the new dataset
order country_text_id country_id year cldiscw

* Drop observations for years in which values for the variable of interest (with posteriors) is missing
drop if cldiscw == .

* Save the new file
save "C:\Users\fbizz\Dropbox\ShamrockSeries\Fernando\tutorial_CIs\cldiscw.dta", replace

***************************************

*** Analysis Section ***

*** Analysis Preparation ***
*Merge with V-Dem Country-Year data
merge m:m country_text_id year using "C:\Users\fbizz\Dropbox\Dissertation\Data\vdem_cy_v5.dta", nogenerate

*Sets Data
xtset country_id year

* Set Matrix Size to fit analysis
set matsize 5000


*** Analysis ***
*** Monte Carlo Estimates Using V-Dem 900 Draw Posterior Distribution***
*Run the monte carlo

forvalues i = 2/901 {

*Print out an iteration number
display `i'

*Fit the model, using the ith draw from the UDS posterior
quietly xtreg F.e_peinfmor cldiscw_v`i' e_migdppcln, fe vce(cluster country_id) 

*Extract the coefficients and variance-covariance matrix
matrix b = e(b)
matrix V = e(V)
local blength = colsof(b)
matrix rsq = e(r2)

*Preserve the dataset, take a single multivariate normal draw from the
*posterior distribution of the coefficients, and restore the dataset.
*We use the capture command to catch possible errors in drawnorm
*and drop these iterations gracefully.
preserve 
capture quietly drawnorm b1-b`blength', double n(1) means(b) cov(V) clear
if _rc==0 {
mkmat b1-b`blength', matrix(bsample)
matrix posterior = nullmat(posterior) \ bsample
matrix rsquared = nullmat(rsquared) \ rsq
}
else {
display "Error drawing sample...iteration dropped"
}
restore

*Closes the Monte Carlo Loop
}

*Get posterior and rqaured as matrix, ready to work with
svmat posterior
svmat rsquared

*Calculate means and standard deviations
tabstat posterior*, stat(mean sd)

*Find the bounds of the 95 percent credible interval
centile posterior*, centile(2.5, 97.5)

* Find the R-Squared
tabstat rsquared*, stat(mean sd)