项目作者: nietsnel

项目描述 :
Institutional research
高级语言: R
项目地址: git://github.com/nietsnel/inst.research.git
创建时间: 2016-09-12T21:08:29Z
项目社区:https://github.com/nietsnel/inst.research

开源协议:

下载


To install the latest version of inst.research from github:

  1. # install.packages("devtools")
  2. devtools::install_github("nietsnel/inst.research")

Functions Overview

  • bulk_import() A function to import multiple .txt or .csv files into R. (website example forthcoming).
  • usm_labels() labels variables and raw values using one of three different methods.

usm_labels() usage:

usm_labels is a function to attach labels to a dataframe currently loaded in memory. This process can be facilitated using one of three methods: (1) by using the default MHEC labels included with the inst.research package; (2) by using custom user defined value labels in the R-console; or (3) by loading an external dataframe frame into R that contains the value-label pairings. These three methods are presented in the examples below.

1. Attach default labels and variable names.

The “inst.research” package includes an unlabeled “example_dataset” (?example_dataset for more info). To attach the default MHEC label pairings to this dataset follow the example below.

  1. ##### Example 1.
  2. library(inst.research) #Attach the inst.research package
  3. print(head(example_dataset, 12), row.names=FALSE) #View the example dataset.
  4. # IDType Degree Gender UScitizen
  5. # 2 60 2 1
  6. # 2 60 2 2
  7. # 2 40 2 1
  8. # 3 81 2 1
  9. # 2 81 1 2
  10. # 2 81 1 2
  11. # 2 60 1 1
  12. # 3 81 2 1
  13. # 1 40 1 2
  14. # 1 40 2 1
  15. # 1 60 2 2
  16. # 2 40 2 2
  17. usm_labels(dataset = example_dataset, label_values = TRUE, label_variables = TRUE)
  18. ##note that there are options to re-label the values and or the variables themselves.
  19. ##See usm_labels help to view all parameter options.
  20. print(head(output_file, 12), row.names=FALSE) #View the example dataset with labels attached.
  21. # IDType Degree Gender US Citizen
  22. # 1 Doc. R/S Female Foreign
  23. # 2 Doc. R/S Female US Citizen
  24. # 2 Bachelors Male US Citizen
  25. # 2 Masters Male Foreign
  26. # 2 Bachelors Female Foreign
  27. # 1 Masters Female US Citizen
  28. # 1 Doc. R/S Female US Citizen
  29. # 1 Doc. R/S Female Foreign
  30. # 2 Bachelors Male US Citizen
  31. # 2 Masters Male US Citizen
  32. # 1 Doc. R/S Male Foreign
  33. # 2 Bachelors Male US Citizen

2. Define value-label pairs in R.

Secondly, value-label pairs can be written directly in R. This is useful when the list of value-label pairings is short. This method can utilized by following the simple formatting shown in the example below.

  1. ##### Example 2.
  2. library(inst.research) #Attach the inst.research package
  3. data_def<- c("var.name_IDType" , 1, "Student", 2, "faculty", 3, "staff",
  4. "var.name_USCitizen", 1, "Yes", 2, "No",
  5. "var.name_Gender", 1, "male", 2, "female",
  6. "var.name_Degree", 40, "BA", 60, "MA", 81, "AA")
  7. # Note: each variable name must follow the "var.name_" prefix. Secondly, each value (e.g., 1, 2, etc) must be
  8. # paired with a label (eg., "student"). Once the variables have been defined, call the object using the
  9. # **manual_label_input** parameter in the usm_labels function as shown below.
  10. usm_labels(dataset=example_dataset, label_variables = FALSE, label_values=FALSE, manual_label_input=data_def)
  11. ##Attach user-defined labels to example dataset.
  12. print(head(output_file, 12), row.names=FALSE) #View the example dataset with labels attached.
  13. # IDType Degree Gender UScitizen
  14. # Student AA female 2
  15. # faculty AA female 1
  16. # faculty BA male 1
  17. # faculty MA male 2
  18. # faculty BA female 2
  19. # Student MA female 1
  20. # Student AA female 1
  21. # Student AA female 2
  22. # faculty BA male 1
  23. # faculty MA male 1
  24. # Student AA male 2
  25. # faculty BA male 1

3. Define value-label pairs using an external data.frame

A dataframe containing value-label pairs can also be used for relabeling. This is useful when there are a large amount of value-label pairings stored in an external file (e.g, comma separated file.)

The value-label pairings must be in the following format.

  1. "Degree", "86", "Doc. Other",
  2. "Degree", "87", "Non-Deg Grad",
  3. "Degree", "99", "Multi Major",
  4. "DependStatus", "0", "Unknown",
  5. "DependStatus", "1", "Dependent",
  6. "DependStatus", "2", "Independent",
  7. "DistEdFlag", "1", "Exclusively",
  8. "DistEdFlag", "2", "Some",
  9. "Gender", "1", "Male",
  10. "Gender, "2", "Female"

Note: Each line must begin with the variable name corresponding to the value-label pair.

The “inst.research” package includes an unlabeled “example_dataset” (?example_dataset for more info) which we can combine with a second included dataset called “example_external_labels”. You can try this process using the procedure shown in the following example.

  1. ##### Example 3.
  2. ##### Step 1.
  3. # load the inst.research package and import your value-label pairings into R (e.g., read_csv()). Because
  4. # inst.research contains an example labels dataframe this step can be skipped. You can also view both of the
  5. # example datasets using the print() function.
  6. library(inst.research) ##Attach inst.research library
  7. print(head(example_dataset, 12), row.names=FALSE) #View the example dataset.
  8. # IDType Degree Gender UScitizen
  9. # 2 60 2 1
  10. # 2 60 2 2
  11. # 2 40 2 1
  12. # 3 81 2 1
  13. # 2 81 1 2
  14. # 2 81 1 2
  15. # 2 60 1 1
  16. # 3 81 2 1
  17. # 1 40 1 2
  18. # 1 40 2 1
  19. # 1 60 2 2
  20. # 2 40 2 2
  21. print(example_external_labels, row.names=FALSE) #View the example external labels.
  22. # V1 V2 V3
  23. # Degree 40 BA
  24. # Degree 60 MA
  25. # Degree 81 AA
  26. # DependStatus 0 Unknown
  27. # DependStatus 1 Dependent
  28. # DependStatus 2 Independent
  29. # DistEdFlag 1 Exclusively
  30. # DistEdFlag 2 Some
  31. # Gender 1 Male
  32. # Gender 2 Female
  33. # UScitizen 1 Yes
  34. # UScitizen 2 No
  35. ##### Step 2.
  36. # Label the example_dataset using the usm_labels() function.
  37. usm_labels(dataset=example_dataset, label_variables = FALSE, label_values=FALSE,
  38. label_matrix=example_external_labels)
  39. # You can then view the results below.
  40. print(head(output_file, 15), row.names=FALSE) #View the example dataset..
  41. # IDType Degree Gender UScitizen
  42. # 1 AA Female No
  43. # 2 AA Female Yes
  44. # 2 BA Male Yes
  45. # 2 MA Male No
  46. # 2 BA Female No
  47. # 1 MA Female Yes
  48. # 1 AA Female Yes
  49. # 1 AA Female No
  50. # 2 BA Male Yes
  51. # 2 MA Male Yes
  52. # 1 AA Male No
  53. # 2 BA Male Yes
  54. # 3 AA Female Yes
  55. # 2 BA Male Yes
  56. # 3 BA Male No