About Complex Sampl.R

In research and in Monitoring & Evaluation, sampling is a fact of life. While a simple random sample (SRS) is the 'gold standard' approach, it is often not practical. A multi-stage cluster sampling design with probability of selection proportional to size (PPS) can be used when compling a complete sampling frame is cost-prohibitive or impossible. This approach helps investigators conserve resources otherwise spent in fuel and time.

This app is designed select clusters for a two-stage Probability Proportional to Size (PPS) cluster sampling design. Only a .csv file (xlsx is not yet supported) with at least one column labeled 'population' is required. This column should contain the population or number of Ultimate Sampling Units (USUs) within each Primary Sampling Unit (PSU).

If the number of USUs in a given PSU is smaller than the cluster size entered, the app will automatically sample from the next row(s) until there are sufficient USUs for a cluster.

Known limitations

  • First, the code currently only samples subsequent rows in the dataset when the population of a PSU is smaller than the cluster sample size. Therefore, if a PSU is selected near the bottom of the .csv file the loop may stop prematurely. The result of this limitation would be a cluster with more USUs sampled than it actually contains.
  • The other limitation is that the loop is currently calculated only by counting the sum of the population from a group of PSU's until there are as many or more USUs available than the cluster size. In rare cases, this may mean that more USU's may be sampled from a PSU than it contains. For example, suppose the desired cluster size was 100. A selected PSU (Community A) contained 80 USUs and the following PSU (Community B) contained 30. The loop would recognize that between Community A and Community B, there are more than 100 USUs. It would then cheerfully divide 100 USUs by 2 PSUs, assigning half (50) of the cluster to be taken from Community A (with a 80 USUs) and 50 to be taken from Community B (thought it only contains 30).

Both of these issues should be infrequent, but will be addressed in future versions. Find the source code for this app under a GNU license at https://github.com/jwilliamrozelle/figuredio.


  1. (Optional) At this time, if you wish to stratify your sample, each strata can be uploaded as a separate csv. You may then follow steps 2-6 for each strata.
  2. Upload a csv file using the 'Browse...' button. Your csv file must (at least) contain a column called 'population', containing the number of USUs in that PSU. Other columns may be included as desired. If you wish, you may use the Example data (downloadable below) to test the app, or as a template.
  3. Input the number of PSUs in your sample, and the number of USUs in each PSU
  4. Click 'Generate Random Start'.
  5. (Optional) After generating a random start, you may select your own random start. This may be useful when trying to reproduce a previous selection.
  6. Download the generated sample in xlsx format by clicking 'Download Sample Result'. The two sheets are:
    • Sample: Containing all uploaded info with additional columns detailing the selection
    • Sample Info: Containing documentation of your inputs, the sample size and sampling interval

Example data

Download Template

Meta Data about Sample Result