% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/code_for_paper.r
\name{linkd}
\alias{linkd}
\title{Function to impute missing agreement patterns and then to link data}
\usage{
linkd(d, initial_m = NULL, initial_u = NULL, p_init = 0.5,
  fixed_col = NULL, alg = "m")
}
\arguments{
\item{d}{Matrix of agreement patterns with final column counting the number of times that pattern was observed.  See Details}

\item{initial_m}{starting probabilities for per-field agreement in record pairs, both records being generated from the same individual.  Defaults to NULL}

\item{initial_u}{starting probabilities for per-field agreement in record pairs, with the two records being generated from differing individuals  Defaults to NULL}

\item{p_init}{starting probability that both records for a randomly selected record pair is associated with the same individual}

\item{fixed_col}{vector indicating columns that are not to be updated in initial EM algorithm.  Useful if good prior estimates of the mis-match probabilities.  See details}

\item{alg}{character; see Details}
}
\value{
A list, the first component is a matrix -  the posterior probabilities of being a true match is the last column, the second component are the fitted models used to generate the predicted probabilities
}
\description{
Function to impute missing agreement patterns and then to link data
}
\details{
\code{d} is a numeric matrix with N rows corresponding to N record pairs, and L+1 columns the first L of which show the field agreement patterns observed over the record pairs, and the last column the total number of times that pattern was observed in the database.  The code 0 is used for a field that differs for two record, 1 for a field that agrees, and 2 for a missing field. \code{fixed_col} indicates the components of the \code{u} vector (per field probabilities of agreement for 2 records from differing individuals) that are not to be updated when applying the EM algorithm to estimate components of the Feligi Sunter model.  \code{alg} has four possible values.  The default \code{'m'} fits a log-linear model for the agreement counts only within the record pairs that corresponds to the same individual, \code{'b'} fits differing log-linear models for the 2 clusters, \code{'i'} corresponds to the original Feligi Sunter algorithm, with probabilities estimated via the EM algorithm, \code{'a'} fits all the previously listed models
}
\examples{

# Simulate data
m_probs <- rep(0.8,6)
u_probs <- rep(0.2,6)
means_match <- -1*qnorm(1-m_probs)
means_mismatch <- -1*qnorm(1-u_probs)
missingprobs <- rep(.2,6)
thedata <- do_sim(cor_match=0.2,cor_mismatch=0,nsample=10^4,pi_match=.5,
m_probs=rep(0.8,5),u_probs=rep(0.2,5),missingprobs=rep(0.4,5))
colnames(thedata) <- c(paste("V",1:5,sep="_"),"count")
output <- linkd(thedata)
output$fitted_probs
}
\keyword{EM}
\keyword{Feligi/Sunter,}
\keyword{algorithm,}
\keyword{class,}
\keyword{correlation}
\keyword{latent}
\keyword{linkage,}
\keyword{probabilistic}

