You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The main focus of my research is to assess the accuracy of matching in data linkage i.e. to assess the likelihood that records matched from the two files actually belongs to the same individual. We proposed a Markov Chain based Monte Carlo simulation method for assessing linkage accuracy and illustrates the utility of the approach using the ABS (Australian Bureau of Statistics) synthetic data in realistic data settings.
Given the current state of the chain, A(n), the next state, A(n+1), will be constructed following a defined algorithm developed, which maintains internal consistency patterns of agreement. The idea is to generate re-sampled versions of the agreement array in such a way as to preserve the underlying probabilistic linking structure.
For assessing the accuracy:
correctly linked proportions are investigated for each record with different blocking strategies
The average proportions of correct links are observed with the increasing block sizes and for a range of cut-off values with the aim of facilitating optimal choice of block sizes and cut-off values while improving existing linking processes by achieving higher accuracy.
To improve the existing method, I am working on a concept of using similarity weight in the agreement matrix. This weight will allow partial agreement of the linking variable values for record pairs in the form of similarity weight.
The text was updated successfully, but these errors were encountered:
The main focus of my research is to assess the accuracy of matching in data linkage i.e. to assess the likelihood that records matched from the two files actually belongs to the same individual. We proposed a Markov Chain based Monte Carlo simulation method for assessing linkage accuracy and illustrates the utility of the approach using the ABS (Australian Bureau of Statistics) synthetic data in realistic data settings.
Given the current state of the chain, A(n), the next state, A(n+1), will be constructed following a defined algorithm developed, which maintains internal consistency patterns of agreement. The idea is to generate re-sampled versions of the agreement array in such a way as to preserve the underlying probabilistic linking structure.
For assessing the accuracy:
The text was updated successfully, but these errors were encountered: