vignettes/making-an-sps-file.Rmd
making-an-sps-file.Rmd
Most times you will deal with a .dat+.sps file pair, you will be given the .sps file already made. However, in rare cases - such as working with FBI data - you need to make you own based on a PDF guide provided.
The function make_sps_setup
makes that a bit easier by doing all the formatting needed, meaning you just need to provide the info for each section. It will create a .sps setup file and save it in your working directory.
There are two parameters absolutely needed - file_name
and col_positions
- as well as four optional parameters.
file_name
- Name of the file to be saved (e.g. “setup_file1”). There is no need to put the .sps extension in the file name.col_positions
- Either a vector of strings indicating the start and end position of each column (e.g. “1-3”, “4-5”) or a vector of the widths of the columns (e.g. 3, 2).col_names
- A vector of names for the columns. If none are provided, will automatically create names based on column number (e.g. V1, V2, V3).col_labels
- A vector of labels for the columns. These are often longer and more descriptive than the col_names. These are the values used as column names if real_names = TRUE in reading in the data.value_labels
- A vector with the value first, then an ’ = ’ then the label. Each new column should have the column named followed by ’ =’.missing_values
- A vector of strings with the column name followed by the values to be replaced by NA.file_name
is simply the string of the name you want the file to be saved as, such as “setup_file_example”. Including the “.sps” to the end is not required.
This is a vector either of strings indicating the width of each column (starting at 1) or the starting and ending point of that column with a “-” between the start and end number.
col_positions <- c("1-1", "2-3", "4-5", "6-11")
This is a vector of strings with names for the column. Only useful if the guide you’re following to make this .sps file includes these names and you want to be exact. If you don’t include this, it will just name the columns V1:Vnumber_of_columns.
col_names <- c("var1", "var2", "var3", "var4")
This is the same as col_names but each column name should be the descriptive name of the column. These names are the ones that will be used if real_names = TRUE in read_ascii_setup.
col_labels <- c("version_number", "victim_sex", "victim_race", "state")
This is a named vector with the column name as its own string followed by a string for each value-label pair in that column. The syntax is ‘value = label’. For the column name, the label is blank so it appears ‘column =’. For example, if ‘sex’ is the column name as it has value labels for M = Male and F = Female, the value_labels parameter will be:
value_labels = c(“sex =”, “M = Male”, “F = Female”)
value_labels <- c("victim_sex = ",
"MA = male",
"FE = female",
"UN = unknown",
"victim_race = ",
"WH = white",
"BL = black",
"IA = american indian or alaskan native",
"AS = asian",
"UN = unknown")
This is a vector of strings with the column name (can be either the name in col_names or col_labels but must be consistent) followed by all the values that represent missing values, each in its own string.
missing_values <- c("victim_sex",
"-9",
"-8",
"victim_race",
"-8")
asciiSetupReader::make_sps_setup(file_name = "setup_file_example",
col_positions = col_positions,
col_names = col_names,
col_labels = col_labels,
value_labels = value_labels,
missing_values = missing_values)