Finding Duplicate Documents in MongoDB

 
 
  • Gérald Barré

Recently, I needed to create a unique index on a MongoDB collection. However, there were duplicate documents, which caused the following error:

E11000 duplicate key error collection: db.collection index: index_name dup key: { key: "duplicate value" }

The error message shows only the first duplicate value, but there may be more. Instead of fixing them one by one, I used the aggregation pipeline to find them all at once.

Connect to the database using the command line or your preferred tool, then run the following aggregation pipeline:

JavaScript
db.Demo.aggregate([
    // Group by the key and compute the number of documents that match the key
    {
        $group: {
            _id: "$Nickname",  // or if you want to use multiple fields _id: { a: "$FirstName", b: "$LastName" }
            count: { $sum: 1 }
        }
    },
    // Filter group having more than 1 item, which means that at least 2 documents have the same key
    {
        $match: {
            count: { $gt: 1 }
        }
    }
])

This pipeline outputs the duplicate keys:

For a visual approach, you can use MongoDB Compass to build the same aggregation pipeline:

Once identified, you can remove the duplicate documents and safely add the unique index.

Do you have a question or a suggestion about this post? Contact me!

Follow me:
Enjoy this blog?