Wednesday, April 6, 2022

How to dummy code a variable using syntax

If you need to dummy code a variable, this post explains how to do it.  First, the syntax is below, without comments:

tab1 newvar origvar

generate newvar = -99

replace newvar = 0 if origvar == 1

replace newvar = 1 if origvar == 2

replace newvar= 1 if origvar == 3

replace newvar = 1 if origvar == 4

replace newvar = 1 if origvar == 5

mvdecode newvar, mv(-99)

tab1 newvar origvar

Here is the same syntax, with notes explaining what I am doing with each step:

tab1 origvar First, you need to review your original variable before making changes, so this line tells STATA to provide a frequency distribution for the variable you will be working off of.  Let's assume the original value has missing values of -5, -3, -1, and values of 1 through 5.  We will need to make values of 1 equal to 0 in your new variable and make values of 2 through 5 equal to 1 in your new variable to dummy code it.  We also need to ensure all missing values are designated as missing. 

generate newvar = -99 First, create your new variable to work off of (since we won't change the original variable in your dataset.  This generates a new variable named 'newvar' and makes all values equal to -99.  

replace newvar = 0 if origvar == 1 This line is replacing the values of -99 on the new variable you're creating with 0s, assuming the original variable (origvar) is equal to 1. 

replace newvar = 1 if origvar == 2 This line is replacing the remaining values of -99 on the new variable you're creating with 1s, assuming the original variable (origvar) is equal to 2. 

replace newvar= 1 if origvar == 3  This line is replacing the remaining values of -99 on the new variable you're creating with 1s, assuming the original variable (origvar) is equal to 3. 

replace newvar = 1 if origvar == 4 This line is replacing the remaining values of -99 on the new variable you're creating with 1s, assuming the original variable (origvar) is equal to 4. 

replace newvar = 1 if origvar == 5 This line is replacing the remaining values of -99 on the new variable you're creating with 1s, assuming the original variable (origvar) is equal to 5. 

mvdecode newvar, mv(-99) This line is designating the remaining values of -99 for 'newvar' as missing.  Since we already made the original variable's missing values equal to -99 (along with all the other values), this takes care of all the missing data and the proper designation.

tab1 newvar origvar This line is telling STATA to display a frequency distribution for both your new variable and the original variable in your dataset.  You should do this last step to ensure your new variable is created correctly.  If your original variable had 15 cases equal to 1, your new variable should have 15 cases equal to 0.  If your original variable had 10 cases equal to 2, 10 cases equal to 3, 10 cases equal to 4, and 10 cases equal to 5, your new variable should have 40 cases equal to 2.  Double check the total number of cases as well.

No comments:

Post a Comment

How to find the reliability of a scale with syntax

 To find the reliability of a scale, you need to list all the variables / items that comprise the scale, for example: alpha variable1 variab...