The parse_setup function is what this packages uses to convert the .sps or .sas setup files into an usable format for R.

This will return a list of length 3 containing the objects “setup”, “value_labels”, and “missing”.

setup

The first object in the returned list is a data.frame with 4 columns and as many rows as there are columns in the data. The “column_number” column is the non-descriptive name of the column while the “column_name” is the descriptive name of the column. In read_ascii_setup, setting use_clean_names to TRUE will set the data column names to the “column_name” names, otherwise it will remain as the “column_number” names. Since the data is in fixed-width format, you need to know the location of each column. The “begin” and “end” columns in this object provide that location for each column in the data.

value_labels

To make the data more compact, the data often provides values that represent a label. For example, in a column about participant’s gender it may only include “M” and “F” which stands for “Male” and “Female”. The setup file will say the M = Male and F = Female. The value labels tell us that we need to convert M to Male in the given column. This is a list of named vectors indicating the value and its corresponding label. If there are no value labels this object will be NULL.

example$value_labels[1:3]
#> $V1
#> SHR master file 
#>             "6" 
#> 
#> $V2
#>         Alabama         Arizona        Arkansas      California 
#>             "1"             "2"             "3"             "4" 
#>        Colorado     Connecticut        Delaware Washington, D.C 
#>             "5"             "6"             "7"             "8" 
#>         Florida         Georgia           Idaho        Illinois 
#>             "9"            "10"            "11"            "12" 
#>         Indiana            Iowa          Kansas        Kentucky 
#>            "13"            "14"            "15"            "16" 
#>       Louisiana           Maine        Maryland   Massachusetts 
#>            "17"            "18"            "19"            "20" 
#>        Michigan       Minnesota     Mississippi        Missouri 
#>            "21"            "22"            "23"            "24" 
#>         Montana        Nebraska          Nevada   New Hampshire 
#>            "25"            "26"            "27"            "28" 
#>      New Jersey      New Mexico        New York  North Carolina 
#>            "29"            "30"            "31"            "32" 
#>    North Dakota            Ohio        Oklahoma          Oregon 
#>            "33"            "34"            "35"            "36" 
#>    Pennsylvania    Rhode Island  South Carolina    South Dakota 
#>            "37"            "38"            "39"            "40" 
#>       Tennessee           Texas            Utah         Vermont 
#>            "41"            "42"            "43"            "44" 
#>        Virginia      Washington   West Virginia       Wisconsin 
#>            "45"            "46"            "47"            "48" 
#>         Wyoming          Alaska          Hawaii      Canal Zone 
#>            "49"            "50"            "51"            "52" 
#>     Puerto Rico  American Samoa            Guam  Virgin Islands 
#>            "53"            "54"            "55"            "62" 
#> 
#> $V4
#>               Possessions         ALL cit 250,000 + 
#>                       "0"                       "1" 
#>       Cit 100,000-249,999         Cit 50,000-99,999 
#>                       "2"                       "3" 
#>         Cit 25,000-49,999         Cit 10,000-24,999 
#>                       "4"                       "5" 
#>           Cit 2,500-9,999               Cit < 2,500 
#>                       "6"                       "7" 
#>               Non-MSA co.              MSA counties 
#>                       "8"                       "9" 
#>           Cit 1,000,000 +       Cit 500,000-999,999 
#>                      "1A"                      "1B" 
#>       Cit 250,000-499,999     Non-MSA co. 100,000 + 
#>                      "1C"                      "8A" 
#> Non-MSA co. 25,000-99,999 Non-MSA co. 10,000-24,999 
#>                      "8B"                      "8C" 
#>      Non-MSA co. < 10,000         Non-MSA St Police 
#>                      "8D"                      "8E" 
#>         MSA co. 100,000 +     MSA co. 25,000-99,999 
#>                      "9A"                      "9B" 
#>     MSA co. 10,000-24,999          MSA co. < 10,000 
#>                      "9C"                      "9D" 
#>             MSA St Police 
#>                      "9E"

There is one named vector for each column in the data that has value labels. We can see how many there are using length().

length(example$value_labels)
#> [1] 141

missing

The final object in the list a data.frame with two columns and as many rows as there are missing values in the data. The column “variable” indicates the column in the data and the column “values” says that the value in that row is to be replaced with NA. For example, if there are 10 columns in the data with missing values and each column has two missing values (e.g. -8 and -9) there will be 20 rows in this data.frame. A missing value is when the data includes a value that should be replaced with NA. For example, data often includes negative values such as -8 or -9 mean that that value is missing and should be NA. If there are no missing values this object will be NULL.

head(example$missing)
#> NULL