Title: | Network Scale Up Method |
---|---|
Description: | A Bayesian framework for population group size estimation using the Network Scale Up Method (NSUM). Size estimates are based on a random degree model and include options to adjust for barrier and transmission effects. |
Authors: | Rachael Maltiel and Aaron J. Baraff |
Maintainer: | Aaron J. Baraff <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.0.1 |
Built: | 2025-01-22 02:54:48 UTC |
Source: | https://github.com/mbojan/NSUM |
A Bayesian framework for subpopulation size estimation using the Network Scale Up Method (NSUM). Size estimates are based on a random degree model and include options to adjust for barrier and transmission effects.
Package: | NSUM |
Type: | Package |
Version: | 1.0 |
Date: | 2014-12-17 |
License: | GPL-2 | GPL-3 |
The main estimation function is nsum.mcmc
. It produces a Markov chain Monte Carlo (MCMC) sample from the posterior distributions of the subpopulation size parameters from a random degree model based upon the Network Scale Up Method (NSUM). Options allow for the inclusion of barrier and transmission effects, both separately and combined, resulting in four models altogether. Also included are functions to simulate data from any of these four models (nsum.simulate
) and to estimate reasonable starting values for the MCMC sampler (killworth.start
). Two data sets have been provided for testing purposes (McCarty
and Curitiba
).
Rachael Maltiel and Aaron J. Baraff
Maintainer: Aaron J. Baraff <ajbaraff at uw.edu>
Killworth, P., Johnsen, E., McCarty, C., Shelley, G., and Bernard, H. (1998a), "A Social Network Approach to Estimating Seroprevalence in the United States," Social Networks, 20, 23-50.
Killworth, P., McCarty, C., Bernard, H., Shelley, G., and Johnsen, E. (1998b), "Estimation of Seroprevalence, Rape, and Homelessness in the United States using a Social Network Approach," Evaluation Review, 22, 289-308.
Maltiel, R., Raftery, A. E., McCormick, T. H., and Baraff, A. J., "Estimating Population Size Using the Network Scale Up Method." CSSS Working Paper 129. Retrieved from https://www.csss.washington.edu/Papers/2013/wp129.pdf
McCarty, C., Killworth, P. D., Bernard, H. R., Johnsen, E. C., and Shelley, G. A. (2001), "Comparing Two Methods for Estimating Network Size," Human Organization, 60, 28-39.
Salganik, M., Fazito, D., Bertoni, N., Abdo, A., Mello, M., and Bastos, F. (2011a), "Assessing Network Scale-up Estimates for Groups Most at Risk of HIV/AIDS: Evidence From a Multiple-Method Study of Heavy Drug Users in Curitiba, Brazil," American Journal of Epidemiology, 174, 1190-1196.
killworth.start
, nsum.mcmc
, nsum.simulate
## load data data(McCarty) ## simulate from model with barrier effects sim.bar <- with(McCarty, nsum.simulate(100, known, unknown, N, model="barrier", mu, sigma, rho)) ## estimate unknown population size dat.bar <- sim.bar$y mcmc <- with(McCarty, nsum.mcmc(dat.bar, known, N, model="barrier", iterations=100, burnin=50)) ## view posterior distribution hist(mcmc$NK.values[1,])
## load data data(McCarty) ## simulate from model with barrier effects sim.bar <- with(McCarty, nsum.simulate(100, known, unknown, N, model="barrier", mu, sigma, rho)) ## estimate unknown population size dat.bar <- sim.bar$y mcmc <- with(McCarty, nsum.mcmc(dat.bar, known, N, model="barrier", iterations=100, burnin=50)) ## view posterior distribution hist(mcmc$NK.values[1,])
This dataset contains the subpopulation sizes and parameters used for simulations involving the Curitiba data.
data("Curitiba")
data("Curitiba")
A list with the following 7 variables.
a vector of positive numbers, the sizes of known subpopulations.
a vector of positive numbers, the sizes of unknown subpopulations.
a positive number, the (known) total population size.
a real number, the location parameter for the log-normal distribution of network degrees, with default 5.
a positive number, the scale parameter for the log-normal distribution of network degrees, with default 1.
a vector of numbers between 0 and 1 with length equal to the total number of subpopulations, known and unknown, the dispersion parameters for the barrier effects, with defaults 0.1.
a vector of numbers between 0 and 1 with length equal to the total number of unknown subpopulations, the multipliers for the transmission biases, with defaults 1.
The Curitiba dataset consists of 500 adult residents of Curitiba, Brazil and was collected through a household-based random sample in 2010.
Salganik, M., Fazito, D., Bertoni, N., Abdo, A., Mello, M., and Bastos, F. (2011a), "Assessing Network Scale-up Estimates for Groups Most at Risk of HIV/AIDS: Evidence From a Multiple-Method Study of Heavy Drug Users in Curitiba, Brazil," American Journal of Epidemiology, 174, 1190-1196.
## load data data(Curitiba) ## simulate from model with transmission bias sim.trans <- with(Curitiba, nsum.simulate(100, known, unknown, N, model="transmission", mu, sigma, tauK))
## load data data(Curitiba) ## simulate from model with transmission bias sim.trans <- with(Curitiba, nsum.simulate(100, known, unknown, N, model="transmission", mu, sigma, tauK))
This function calculates the Killworth estimates for unknown subpopulation sizes based on NSUM data.
killworth(dat, known, N)
killworth(dat, known, N)
dat |
a matrix of non-negagtive integers, the |
known |
a vector of positive numbers, the sizes of known subpopulations. All additional columns of |
N |
a positive number, the (known) total population size. |
The function killworth
allows for the estimation of subpopulation sizes from Killworth's network scale-up model. These estimates can be used to compare with the MCMC results in this package. For reasonable starting values for the MCMC function nsum.mcmc
, see the function killworth.start
.
A vector of positive numbers with length equal to the number of unknown subpopulations, the Killworth estimates of the subpopulation sizes.
Rachael Maltiel and Aaron J. Baraff
Maintainer: Aaron J. Baraff <ajbaraff at uw.edu>
Killworth, P., Johnsen, E., McCarty, C., Shelley, G., and Bernard, H. (1998a), "A Social Network Approach to Estimating Seroprevalence in the United States," Social Networks, 20, 23-50.
Killworth, P., McCarty, C., Bernard, H., Shelley, G., and Johnsen, E. (1998b), "Estimation of Seroprevalence, Rape, and Homelessness in the United States using a Social Network Approach," Evaluation Review, 22, 289-308.
## load data data(McCarty) ## simulate from model with barrier effects sim.bar <- with(McCarty, nsum.simulate(100, known, unknown, N, model="barrier", mu, sigma, rho)) ## estimate unknown population sizes dat.bar <- sim.bar$y NK.killworth <- with(McCarty, killworth(dat.bar, known, N))
## load data data(McCarty) ## simulate from model with barrier effects sim.bar <- with(McCarty, nsum.simulate(100, known, unknown, N, model="barrier", mu, sigma, rho)) ## estimate unknown population sizes dat.bar <- sim.bar$y NK.killworth <- with(McCarty, killworth(dat.bar, known, N))
This function uses the Killworth estimates to calculate reasonable starting values for the MCMC estimation.
killworth.start(dat, known, N)
killworth.start(dat, known, N)
dat |
a matrix of non-negagtive integers, the |
known |
a vector of positive numbers, the sizes of known subpopulations. All additional columns of |
N |
a positive number, the (known) total population size. |
The function killworth.start
allows for the estimation reasonable starting values for many of the parameters in the MCMC function nsum.mcmc
based on Killworth's network scale-up model. These are the default starting values where applicable. For simple subpopulation size estimation using Killworth's model, see the function killworth
.
A list with four components:
NK.start |
a vector of positive numbers with length equal to the total number of unknown subpopulations, the starting values for the sizes of the unknown subpopulations\. |
d.start |
a vector of positive numbers with length equal to the number of individuals, the starting values for the network degrees. |
mu.start |
a real number, the starting value for the location parameter for the log-normal distribution of network degrees. |
sigma.start |
a positive number, the starting value for the scale parameter for the log-normal distribution of network degrees. |
Rachael Maltiel and Aaron J. Baraff
Maintainer: Aaron J. Baraff <ajbaraff at uw.edu>
Killworth, P., Johnsen, E., McCarty, C., Shelley, G., and Bernard, H. (1998a), "A Social Network Approach to Estimating Seroprevalence in the United States," Social Networks, 20, 23-50.
Killworth, P., McCarty, C., Bernard, H., Shelley, G., and Johnsen, E. (1998b), "Estimation of Seroprevalence, Rape, and Homelessness in the United States using a Social Network Approach," Evaluation Review, 22, 289-308.
Maltiel, R., Raftery, A. E., McCormick, T. H., and Baraff, A. J., "Estimating Population Size Using the Network Scale Up Method." CSSS Working Paper 129. Retrieved from https://www.csss.washington.edu/Papers/2013/wp129.pdf
## load data data(McCarty) ## simulate from model with barrier effects sim.bar <- with(McCarty, nsum.simulate(100, known, unknown, N, model="barrier", mu, sigma, rho)) ## estimate Killworth starting values dat.bar <- sim.bar$y start <- with(McCarty, killworth.start(dat.bar, known, N)) ## estimate unknown population size from MCMC mcmc <- with(McCarty, nsum.mcmc(dat.bar, known, N, model="barrier", iterations=100, burnin=50, NK.start=start$NK.start, d.start=start$d.start, mu.start=start$mu.start, sigma.start=start$sigma.start))
## load data data(McCarty) ## simulate from model with barrier effects sim.bar <- with(McCarty, nsum.simulate(100, known, unknown, N, model="barrier", mu, sigma, rho)) ## estimate Killworth starting values dat.bar <- sim.bar$y start <- with(McCarty, killworth.start(dat.bar, known, N)) ## estimate unknown population size from MCMC mcmc <- with(McCarty, nsum.mcmc(dat.bar, known, N, model="barrier", iterations=100, burnin=50, NK.start=start$NK.start, d.start=start$d.start, mu.start=start$mu.start, sigma.start=start$sigma.start))
This dataset contains the subpopulation sizes and parameters used for simulations involving the McCarty data.
data("McCarty")
data("McCarty")
A list with the following 7 variables.
a vector of positive numbers, the sizes of known subpopulations.
a vector of positive numbers, the sizes of unknown subpopulations.
a positive number, the (known) total population size.
a real number, the location parameter for the log-normal distribution of network degrees, with default 5.
a positive number, the scale parameter for the log-normal distribution of network degrees, with default 1.
a vector of numbers between 0 and 1 with length equal to the total number of subpopulations, known and unknown, the dispersion parameters for the barrier effects, with defaults 0.1.
a vector of numbers between 0 and 1 with length equal to the total number of unknown subpopulations, the multipliers for the transmission biases, with defaults 1.
The McCarty data set was obtained through random digit dialing within the United States. It contains responses from 1,375 adults from two surveys: survey 1 with 801 responses conducted in January 1998 and survey 2 with 574 responses conducted in January 1999.
Killworth, P., Johnsen, E., McCarty, C., Shelley, G., and Bernard, H. (1998a), "A Social Network Approach to Estimating Seroprevalence in the United States," Social Networks, 20, 23-50.
Killworth, P., McCarty, C., Bernard, H., Shelley, G., and Johnsen, E. (1998b), "Estimation of Seroprevalence, Rape, and Homelessness in the United States using a Social Network Approach," Evaluation Review, 22, 289-308.
## load data data(McCarty) ## simulate from model with barrier effects sim.bar <- with(McCarty, nsum.simulate(100, known, unknown, N, model="barrier", mu, sigma, rho))
## load data data(McCarty) ## simulate from model with barrier effects sim.bar <- with(McCarty, nsum.simulate(100, known, unknown, N, model="barrier", mu, sigma, rho))
This function produces an MCMC sample from the posterior distributions of the subpopulation size parameters from an NSUM model.
nsum.mcmc(dat, known, N, indices.k = (length(known)+1):(dim(dat)[2]), iterations = 1000, burnin = 100, size = iterations, model = "degree", ...)
nsum.mcmc(dat, known, N, indices.k = (length(known)+1):(dim(dat)[2]), iterations = 1000, burnin = 100, size = iterations, model = "degree", ...)
dat |
a matrix of non-negagtive integers, the |
known |
a vector of positive numbers, the sizes of known subpopulations. |
N |
a positive number, the (known) total population size. |
indices.k |
a vector of positive integers, the indices of the columns of |
iterations |
a positive integer, the total number of MCMC iterations after burn-in, with default 1000. |
burnin |
a non-negative integer, the number of burn-in MCMC iterations, with default 100. |
size |
a positive integer, the number of MCMC iterations kept after thinning, with default equal to |
model |
a character string, the model to be simulated from. This must be one of |
... |
additional arguments to be passed to methods, such as starting values, prior parameters, and tuning parameters. Many methods will accept the following arguments:
|
The function nsum.mcmc
allows for the estimation of the various parameters from a random degree model based upon the Network Scale Up Method (NSUM) by producing Markov chain Monte Carlo (MCMC) samples from their posterior distributions. Options allow for the inclusion of barrier and transmission effects, both separately and combined, resulting in four models altogether. A large number of iterations may be required for accurate inference due to slow mixing, so the resulting chain can be thinned using the size
argument. It should be noted that subpopulation size estimation in the presence of transmission bias can be greatly improved when the priors for the multipliers tauK
are highly informative.
A list with up to nine components:
NK.values |
a matrix of positive numbers with a row for each unknown subpopulation, the thinned MCMC chains representing the posterior distributions of the sizes of the unknown subpopulations. |
d.values |
a matrix of positive numbers with a row for each individual, the thinned MCMC chains representing the posterior distributions of the network degrees. |
mu.values |
a vector of real numbers, the thinned MCMC chain representing the posterior distribution of the location parameter of the log-normal distribution of network degrees. |
sigma.values |
a vector of positive numbers, the thinned MCMC chain representing the posterior distribution of the scale parameter of the log-normal distribution of network degrees. |
rho.values |
a matrix of numbers between 0 and 1 with a row for each subpopulation, known and unknown, the thinned MCMC chains representing the posterior distributions of the dispersion parameters for the barrier effects. |
tauK.values |
a matrix of numbers between 0 and 1 with a row for each unknown subpopulation, the thinned MCMC chains representing the posterior distributions of the multipliers for the transmission biases. |
q.values |
a three-dimensional array of numbers between 0 and 1 with a row for each pairing of individual and subpopulation, the thinned MCMC chains representing the binomial probabilities of the number of people that the individual knows from the subpopulation. |
NK.values |
a matrix of positive numbers with a row for each unknown subpopulation, the thinned MCMC chains representing the posterior distributions of the sizes of the unknown subpopulations. |
iterations |
a positive integer, the total number of MCMC iterations after burn-in. |
burnin |
a non-negative integer, the number of burn-in MCMC iterations. |
Rachael Maltiel and Aaron J. Baraff
Maintainer: Aaron J. Baraff <ajbaraff at uw.edu>
Maltiel, R., Raftery, A. E., McCormick, T. H., and Baraff, A. J., "Estimating Population Size Using the Network Scale Up Method." CSSS Working Paper 129. Retrieved from https://www.csss.washington.edu/Papers/2013/wp129.pdf
## load data data(McCarty) ## simulate from model with barrier effects sim.bar <- with(McCarty, nsum.simulate(100, known, unknown, N, model="barrier", mu, sigma, rho)) ## estimate unknown population size dat.bar <- sim.bar$y mcmc <- with(McCarty, nsum.mcmc(dat.bar, known, N, model="barrier", iterations=100, burnin=50)) ## view posterior distribution of subpopulation sizes for the first subpopulation hist(mcmc$NK.values[1,]) ## view posterior distribution of barrier effect parameters for the first subpopulation hist(mcmc$rho.values[1,])
## load data data(McCarty) ## simulate from model with barrier effects sim.bar <- with(McCarty, nsum.simulate(100, known, unknown, N, model="barrier", mu, sigma, rho)) ## estimate unknown population size dat.bar <- sim.bar$y mcmc <- with(McCarty, nsum.mcmc(dat.bar, known, N, model="barrier", iterations=100, burnin=50)) ## view posterior distribution of subpopulation sizes for the first subpopulation hist(mcmc$NK.values[1,]) ## view posterior distribution of barrier effect parameters for the first subpopulation hist(mcmc$rho.values[1,])
This function simulates data from one of the four NSUM models.
nsum.simulate(n, known, unknown, N, model = "degree", ...)
nsum.simulate(n, known, unknown, N, model = "degree", ...)
n |
a non-negative integer, the number respondents in the sample. |
known |
a vector of positive numbers, the sizes of known subpopulations. |
unknown |
a vector of positive numbers, the sizes of unknown subpopulations. |
N |
a positive number, the (known) total population size. |
model |
a character string, the model to be simulated from. This must be one of |
... |
additional arguments to be passed to methods, such as starting values, prior parameters, and tuning parameters. Many methods will accept the following arguments:
|
The function nsum.simulate
allows for the simulation of data from a random degree model based upon the Network Scale Up Method (NSUM). Options allow for the inclusion of barrier and transmission effects, both separately and combined, resulting in four models altogether. Each call to the function results in the simulation of a single realization of data.
A list with two components:
y |
a matrix of non-negagtive integers, the |
d |
a vector of positive numbers, the network degrees of the individuals. Only the integer parts were used for simulation. |
Rachael Maltiel and Aaron J. Baraff
Maintainer: Aaron J. Baraff <ajbaraff at uw.edu>
Maltiel, R., Raftery, A. E., McCormick, T. H., and Baraff, A. J., "Estimating Population Size Using the Network Scale Up Method." CSSS Working Paper 129. Retrieved from https://www.csss.washington.edu/Papers/2013/wp129.pdf
## load data data(McCarty) ## simulate from model with barrier effects sim.bar <- with(McCarty, nsum.simulate(100, known, unknown, N, model="barrier", mu, sigma, rho)) ## simulate from model with both barrier effects and transmission biases sim.comb <- with(McCarty, nsum.simulate(100, known, unknown, N, model="combined", mu, sigma, rho, tauK)) ## extract data for use in MCMC dat.bar <- sim.bar$y ## view degree distribution hist(sim.bar$d)
## load data data(McCarty) ## simulate from model with barrier effects sim.bar <- with(McCarty, nsum.simulate(100, known, unknown, N, model="barrier", mu, sigma, rho)) ## simulate from model with both barrier effects and transmission biases sim.comb <- with(McCarty, nsum.simulate(100, known, unknown, N, model="combined", mu, sigma, rho, tauK)) ## extract data for use in MCMC dat.bar <- sim.bar$y ## view degree distribution hist(sim.bar$d)