Complex Sampl.R

Import options for .csv


Sampling Design options

About Complex Sampl.R

In research and in Monitoring & Evaluation, surveys and studies often require sampling. While simple random sampling (SRS) is considered the 'gold standard' for sampling, it is often not logistically feasible. When there is no sampling frame available for Ultimate Sampling Units (USUs) in the area of interest or when communities require travel time, a two stage cluster sampline using with probability of selection proportional to size is often used. This approach helps researchers to conserve resources.

This app is designed to do the sampling of the clusters for a two-stage Probability Proportional to Size (PPS) cluster sampling design. All that is needed is a .csv file (xlsx is not yet supported) with at least one column labeled 'population'. This column should contain the population or number of ultimate sampling units within each cluster or primary sampling unit.

If the number of USUs in the PSU is smaller than the cluster size, the app will automatically sample from the next row(s) until there are sufficient USUs for a cluster.

Known limitations

  • First, the code currently only samples subsequent rows in the dataset when the population of a PSU is smaller than the cluster sample size. So, if the loop gets to the end of the csv file and there are no following rows - the loop will stop, even if there are insufficient USUs available for the cluster size.
  • The other limitation is that the loop is currently calculated only by counting the sum of the population from a group of PSU's until there are as many or more USUs than the cluster size. In rare cases, this may mean that more USU's may be assigned to a PSU than it has. For example, if the cluster size was 100, and one PSU had 80 and the following PSU had 30 USUs the loop would recognize that there are more than 100 USUs between the two of them, and assign half (50) of the cluster to be taken from the first (with a 80 USUs) and 50 to be taken from the next (with only 30).

Both of these issues should be infrequent, but will be addressed in future versions. Find the source code for this app under a GNU license at https://github.com/jwilliamrozelle/figuredio.

Instructions

  1. Upload a csv file using the 'Browse...' button. Your csv file must (at least) contain a column called 'population', containing the number of USUs in that PSU. Other columns may be included as desired. If you wish, you may use the Example data (downloadable below) to test the app, or as a template.
  2. Input the number of PSUs in your sample, and the number of USUs in each PSU
  3. Click 'Generate Random Start'.
  4. (Optional) After inputting random start, you may select your own random start. This may be useful when trying to reproduce a previous selection.
  5. (Optional) At this time, if you wish to stratify your sample, each strata can be uploaded as a separate csv. You may then follow steps 1-4 for each strata.

Example data

Download Template

Meta Data about Sample Result

Currently Under Development