Comparing 2 data sets to check for consistency

I am working with a massive Social Network data file and had interns reenter a random 5% of the data to then compare to the master file and check for consistency and reliability in the original file. The problem is that different people entered the data in different orders so I'm not sure how to go about checking to see that the data matches and is consistent or how I should order the data. Does anyone have experience with this sort of thing or have any suggestions on how to approach this problem?

This will depend on the object type of the network. At the simplest it is a two-column data frame or matrix with node-node that can be sorted. Comparison will be easier is the nodes are encoded as integers.

1 Like

The data set looks like this ^ if that is at all helpful

This presents two difficulties:

  1. There are duplicate records
  2. There are no unique identifiers

I'll contact offline for more information

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.