Document-oriented NoSQL database designed for high-volume data storage with flexible schemas, horizontal scaling through sharding, and built-in replication for high availability.

Addresses below are RFC 5737 documentation ranges or placeholders - swap in your own.

Table of Contents#

  1. Overview
  2. Installation
  3. Configuration
  4. Authentication and Security
  5. Basic Operations
  6. Indexing Strategies
  7. Aggregation Pipeline
  8. Replica Set Configuration
  9. Sharding Setup
  10. Backup and Restore
  11. Performance Tuning
  12. Monitoring
  13. Troubleshooting
  14. See Also
  15. Sources

1. Overview#

MongoDB stores data as flexible, JSON-like documents (BSON format) rather than rows and columns. Key characteristics:

  • Schema-less - documents in the same collection can have different fields
  • Rich query language - supports field queries, range queries, regex, and geospatial queries
  • Horizontal scaling - automatic sharding distributes data across multiple servers
  • Replica sets - automatic failover with configurable read preferences
  • Aggregation framework - pipeline-based data processing within the database
  • Change streams - real-time notifications on data changes

Architecture Components#

ComponentPurpose
mongodPrimary database process (data storage)
mongosQuery router for sharded clusters
mongoshInteractive JavaScript shell (replaces legacy mongo)
Config serversStore metadata and routing for sharded clusters

2. Installation#

2.1 Debian/Ubuntu#

# Import GPG key
curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | \
  sudo gpg -o /usr/share/keyrings/mongodb-server-7.0.gpg --dearmor

# Add repository
echo "deb [signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg] \
  http://repo.mongodb.org/apt/debian $(lsb_release -sc)/mongodb-org/7.0 main" | \
  sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list

sudo apt update
sudo apt install -y mongodb-org

2.2 RHEL/Rocky/Alma#

cat > /etc/yum.repos.d/mongodb-org-7.0.repo <<'EOF'
[mongodb-org-7.0]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/7.0/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://pgp.mongodb.com/server-7.0.asc
EOF

sudo dnf install -y mongodb-org

2.3 Arch Linux#

# From AUR
yay -S mongodb-bin mongosh-bin

2.4 Enable and Start#

sudo systemctl enable --now mongod
sudo systemctl status mongod

3. Configuration#

The primary configuration file is /etc/mongod.conf (YAML format):

# Storage
storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
  wiredTiger:
    engineConfig:
      cacheSizeGB: 2  # Default: 50% of (RAM - 1GB)

# Networking
net:
  port: 27017
  bindIp: 127.0.0.1  # Add node IPs for cluster access
  maxIncomingConnections: 65536

# Security
security:
  authorization: disabled  # Enable after creating admin user

# Logging
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log
  logRotate: reopen

# Process management
processManagement:
  timeZoneInfo: /usr/share/zoneinfo

After changes:

sudo systemctl restart mongod

4. Authentication and Security#

4.1 Create Administrative User#

Start with authorization disabled, then create an admin:

mongosh
use admin
db.createUser({
  user: "admin",
  pwd: passwordPrompt(),
  roles: [
    { role: "userAdminAnyDatabase", db: "admin" },
    { role: "readWriteAnyDatabase", db: "admin" },
    { role: "clusterAdmin", db: "admin" }
  ]
})

4.2 Enable Authorization#

Edit /etc/mongod.conf:

security:
  authorization: enabled
sudo systemctl restart mongod

Connect with credentials:

mongosh -u admin -p --authenticationDatabase admin

4.3 Create Application Users#

use appdb
db.createUser({
  user: "appuser",
  pwd: passwordPrompt(),
  roles: [
    { role: "readWrite", db: "appdb" }
  ]
})

4.4 Built-in Roles#

RoleScopePermissions
readDatabaseRead all non-system collections
readWriteDatabaseRead and write
dbAdminDatabaseSchema management, indexing, stats
userAdminDatabaseCreate and manage users
clusterAdminClusterManage replica sets and sharding
rootAllSuperuser access

4.5 TLS/SSL Configuration#

net:
  tls:
    mode: requireTLS
    certificateKeyFile: /etc/ssl/mongodb/server.pem
    CAFile: /etc/ssl/mongodb/ca.pem
# Connect with TLS
mongosh --tls --tlsCAFile /etc/ssl/mongodb/ca.pem \
  --tlsCertificateKeyFile /etc/ssl/mongodb/client.pem

4.6 Network Hardening#

# Firewall: only allow MongoDB from app servers
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.0.2.0/24" port port="27017" protocol="tcp" accept'
firewall-cmd --reload

5. Basic Operations#

5.1 Database and Collection Management#

// Show databases
show dbs

// Switch to (or create) a database
use myapp

// Show collections
show collections

// Create a collection with options
db.createCollection("logs", {
  capped: true,
  size: 1073741824,  // 1 GB
  max: 1000000       // Max documents
})

// Drop a collection
db.logs.drop()

// Drop a database
db.dropDatabase()

5.2 CRUD Operations#

// INSERT
db.users.insertOne({ name: "Alice", email: "alice@example.com", age: 30 })

db.users.insertMany([
  { name: "Bob", email: "bob@example.com", age: 25 },
  { name: "Carol", email: "carol@example.com", age: 35 }
])

// FIND
db.users.findOne({ name: "Alice" })

db.users.find({ age: { $gte: 25 } }).sort({ age: 1 }).limit(10)

db.users.find(
  { age: { $gte: 25, $lte: 35 } },
  { name: 1, email: 1, _id: 0 }  // Projection
)

// UPDATE
db.users.updateOne(
  { name: "Alice" },
  { $set: { age: 31 }, $currentDate: { lastModified: true } }
)

db.users.updateMany(
  { age: { $lt: 30 } },
  { $set: { status: "young" } }
)

// DELETE
db.users.deleteOne({ name: "Alice" })
db.users.deleteMany({ status: "inactive" })

5.3 Query Operators#

OperatorExampleDescription
$eq{ age: { $eq: 30 } }Equals
$gt, $gte{ age: { $gt: 25 } }Greater than (or equal)
$lt, $lte{ age: { $lt: 35 } }Less than (or equal)
$in{ status: { $in: ["A","B"] } }Matches any in array
$and{ $and: [{age: {$gt:25}}, {status:"A"}] }Logical AND
$or{ $or: [{age: 25}, {age: 30}] }Logical OR
$exists{ email: { $exists: true } }Field exists
$regex{ name: { $regex: /^A/i } }Pattern match

6. Indexing Strategies#

6.1 Index Types#

// Single field index
db.users.createIndex({ email: 1 })  // 1=ascending, -1=descending

// Compound index (order matters for query optimization)
db.orders.createIndex({ customerId: 1, orderDate: -1 })

// Unique index
db.users.createIndex({ email: 1 }, { unique: true })

// TTL index (auto-expire documents)
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 })

// Text index (full-text search)
db.articles.createIndex({ title: "text", body: "text" })

// Geospatial index
db.locations.createIndex({ coordinates: "2dsphere" })

// Partial index (index subset of documents)
db.orders.createIndex(
  { status: 1 },
  { partialFilterExpression: { status: { $eq: "pending" } } }
)

6.2 Index Management#

// List all indexes on a collection
db.users.getIndexes()

// Explain a query (check if index is used)
db.users.find({ email: "alice@example.com" }).explain("executionStats")

// Drop an index
db.users.dropIndex("email_1")

// Hide an index (test impact without dropping)
db.users.hideIndex("email_1")
db.users.unhideIndex("email_1")

6.3 Index Best Practices#

  • Index fields that appear in find(), sort(), and $match stages
  • Compound indexes follow the ESR rule: Equality, Sort, Range
  • Avoid indexes on fields with low cardinality (for example, boolean fields)
  • Monitor index size with db.collection.stats().indexSizes
  • Use explain() to verify index usage before and after changes

7. Aggregation Pipeline#

The aggregation pipeline processes documents through a sequence of stages:

7.1 Common Stages#

// $match - filter documents (like find)
// $group - aggregate values
// $sort - order results
// $project - reshape documents
// $lookup - left outer join
// $unwind - deconstruct arrays
// $limit / $skip - pagination

db.orders.aggregate([
  // Stage 1: Filter
  { $match: { status: "completed", orderDate: { $gte: ISODate("2026-01-01") } } },

  // Stage 2: Group by customer
  { $group: {
    _id: "$customerId",
    totalSpent: { $sum: "$amount" },
    orderCount: { $count: {} },
    avgOrder: { $avg: "$amount" }
  }},

  // Stage 3: Sort by total spent
  { $sort: { totalSpent: -1 } },

  // Stage 4: Top 10
  { $limit: 10 }
])

7.2 $lookup (Join)#

db.orders.aggregate([
  { $lookup: {
    from: "customers",
    localField: "customerId",
    foreignField: "_id",
    as: "customer"
  }},
  { $unwind: "$customer" },
  { $project: {
    orderDate: 1,
    amount: 1,
    "customer.name": 1,
    "customer.email": 1
  }}
])

7.3 $bucket (Histogram)#

db.users.aggregate([
  { $bucket: {
    groupBy: "$age",
    boundaries: [0, 18, 30, 45, 60, 120],
    default: "Other",
    output: {
      count: { $sum: 1 },
      names: { $push: "$name" }
    }
  }}
])

8. Replica Set Configuration#

A replica set provides automatic failover with a minimum of 3 members (1 primary, 2 secondaries).

8.1 Configure Each Member#

On each node, edit /etc/mongod.conf:

net:
  bindIp: 0.0.0.0
  port: 27017

replication:
  replSetName: "rs0"

security:
  keyFile: /etc/mongodb/keyfile  # Shared secret for inter-member auth

8.2 Create Keyfile#

openssl rand -base64 756 > /etc/mongodb/keyfile
chmod 400 /etc/mongodb/keyfile
chown mongodb:mongodb /etc/mongodb/keyfile

# Copy the SAME keyfile to all replica set members

8.3 Initialize the Replica Set#

Connect to one member and initiate:

mongosh

rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "192.0.2.11:27017", priority: 2 },
    { _id: 1, host: "192.0.2.12:27017", priority: 1 },
    { _id: 2, host: "192.0.2.13:27017", priority: 1 }
  ]
})

8.4 Verify Replica Set#

rs.status()        // Detailed status
rs.conf()          // Current configuration
rs.isMaster()      // Which node is primary
db.printReplicationInfo()       // Oplog window
db.printSecondaryReplicationInfo()  // Replication lag

8.5 Read Preferences#

// Read from nearest member (reduces latency)
db.users.find().readPref("nearest")

// Available preferences:
// primary        - default, all reads from primary
// primaryPreferred - primary, fallback to secondary
// secondary      - always read from secondary
// secondaryPreferred - secondary, fallback to primary
// nearest        - lowest network latency member

8.6 Arbiter (When Only 2 Data Nodes Available)#

// Add an arbiter (votes but holds no data)
rs.addArb("192.0.2.14:27017")

9. Sharding Setup#

Sharding distributes data across multiple replica sets for horizontal scaling.

9.1 Architecture#

Application
    |
  mongos (query router)
    |
  Config servers (3-member replica set)
    |
  +---------+---------+
  | Shard 1 | Shard 2 | ...  (each shard is a replica set)
  +---------+---------+

9.2 Deploy Config Servers#

On each config server, set /etc/mongod.conf:

sharding:
  clusterRole: configsvr
replication:
  replSetName: "configRS"
net:
  port: 27019
  bindIp: 0.0.0.0

Initialize:

rs.initiate({
  _id: "configRS",
  configsvr: true,
  members: [
    { _id: 0, host: "192.0.2.21:27019" },
    { _id: 1, host: "192.0.2.22:27019" },
    { _id: 2, host: "192.0.2.23:27019" }
  ]
})

9.3 Deploy Shard Replica Sets#

Each shard is a standard replica set configured with shardsvr:

sharding:
  clusterRole: shardsvr
replication:
  replSetName: "shard1RS"
net:
  port: 27018
  bindIp: 0.0.0.0

9.4 Start mongos Router#

mongos --configdb "configRS/192.0.2.21:27019,192.0.2.22:27019,192.0.2.23:27019" \
  --bind_ip 0.0.0.0 --port 27017

9.5 Add Shards and Enable Sharding#

Connect to mongos:

// Add each shard replica set
sh.addShard("shard1RS/192.0.2.31:27018,192.0.2.32:27018,192.0.2.33:27018")
sh.addShard("shard2RS/192.0.2.41:27018,192.0.2.42:27018,192.0.2.43:27018")

// Enable sharding on a database
sh.enableSharding("myapp")

// Shard a collection (choose shard key carefully)
sh.shardCollection("myapp.orders", { customerId: "hashed" })
// Or range-based:
sh.shardCollection("myapp.logs", { timestamp: 1 })

// Verify
sh.status()

9.6 Shard Key Selection#

StrategyUse CaseProsCons
HashedEven distribution neededUniform writesNo range queries on key
RangeTime-series, range queriesEfficient range scansHot spots on recent data
CompoundMixed query patternsBalanced reads/writesMore complex to design

10. Backup and Restore#

10.1 mongodump / mongorestore#

# Full backup (all databases)
mongodump --uri="mongodb://<user>:<pass>@<host>:27017" \
  --out=/backup/mongo/$(date +%Y%m%d)

# Single database
mongodump --db=<database> --out=/backup/mongo/$(date +%Y%m%d)

# Single collection
mongodump --db=<database> --collection=<collection> --out=/backup/

# Compressed backup
mongodump --gzip --archive=/backup/mongo-$(date +%Y%m%d).gz

# Restore full backup
mongorestore --drop /backup/mongo/20260322/

# Restore single database
mongorestore --db=<database> --drop /backup/mongo/20260322/<database>/

# Restore from compressed archive
mongorestore --gzip --archive=/backup/mongo-20260322.gz --drop

10.2 Filesystem Snapshot (WiredTiger)#

For large databases, filesystem snapshots are faster:

# Lock writes
mongosh --eval 'db.fsyncLock()'

# Take LVM or ZFS snapshot
lvcreate -L 10G -s -n mongo-snap /dev/data/mongodb

# Unlock
mongosh --eval 'db.fsyncUnlock()'

10.3 Automated Backup Script#

#!/bin/bash
# /usr/local/bin/mongo-backup.sh
BACKUP_DIR="/backup/mongodb"
RETENTION_DAYS=14
DATE=$(date +%Y%m%d-%H%M%S)

mongodump --gzip --archive="${BACKUP_DIR}/mongo-${DATE}.gz" \
  --uri="mongodb://backupuser:<password>@localhost:27017/?authSource=admin"

if [ $? -eq 0 ]; then
    echo "Backup successful: ${BACKUP_DIR}/mongo-${DATE}.gz"
    find "${BACKUP_DIR}" -name "*.gz" -mtime +${RETENTION_DAYS} -delete
else
    echo "Backup FAILED" >&2
    exit 1
fi

10.4 Continuous Backup with Oplog#

# Dump with oplog for point-in-time recovery
mongodump --oplog --out=/backup/mongo/$(date +%Y%m%d)

# Restore with oplog replay
mongorestore --oplogReplay /backup/mongo/20260322/

11. Performance Tuning#

11.1 WiredTiger Cache#

storage:
  wiredTiger:
    engineConfig:
      # Default: 50% of (RAM - 1GB), min 256MB
      # Set explicitly for predictable behavior
      cacheSizeGB: 4

11.2 Connection Pool#

net:
  maxIncomingConnections: 65536

# Application-side: configure connection pool
# Example (Node.js driver):
# MongoClient.connect(uri, { maxPoolSize: 50, minPoolSize: 10 })

11.3 Read/Write Concerns#

// Write concern: acknowledge after majority replication
db.orders.insertOne(
  { item: "widget" },
  { writeConcern: { w: "majority", wtimeout: 5000 } }
)

// Read concern: only return data committed to majority
db.orders.find().readConcern("majority")

11.4 Profiler#

// Enable profiling for slow operations (> 100ms)
db.setProfilingLevel(1, { slowms: 100 })

// View slow queries
db.system.profile.find().sort({ ts: -1 }).limit(10).pretty()

// Disable profiling
db.setProfilingLevel(0)

12. Monitoring#

12.1 Built-in Commands#

// Server status overview
db.serverStatus()

// Current operations
db.currentOp()

// Collection statistics
db.orders.stats()

// Database statistics
db.stats()

12.2 mongotop and mongostat#

# Real-time read/write per collection
mongotop --uri="mongodb://admin:<pass>@localhost:27017/?authSource=admin" 5

# Server metrics (inserts/queries/updates/deletes per second)
mongostat --uri="mongodb://admin:<pass>@localhost:27017/?authSource=admin" 5

12.3 Key Metrics to Watch#

MetricSourceHealthy Value
Cache hit ratiodb.serverStatus().wiredTiger.cache> 95%
Replication lagrs.printSecondaryReplicationInfo()< 10 seconds
Connections in usedb.serverStatus().connections.currentWell below maxIncomingConnections
Page faultsdb.serverStatus().extra_info.page_faultsLow and stable
Opcountersdb.serverStatus().opcountersConsistent with workload
Ticket availabledb.serverStatus().wiredTiger.concurrentTransactions> 0

13. Troubleshooting#

IssueCauseSolution
MongoServerError: Authentication failedWrong credentials or auth databaseUse --authenticationDatabase admin; verify user with db.getUsers()
Replica set election loopsNetwork flapping or clock skewCheck NTP sync; verify network between members; review rs.status()
Secondary falling behindOplog too small or heavy write loadIncrease oplog size (replSetResizeOplog); check disk I/O on secondary
too many open filesulimit too low for connection countSet LimitNOFILE=65536 in systemd unit; verify with ulimit -n
Slow queriesMissing indexes or full collection scansEnable profiler; check explain() for COLLSCAN; add appropriate indexes
Sharded cluster imbalancePoor shard key choiceMonitor chunk distribution with sh.status(); consider resharding
Out of disk spaceJournal + data + oplog filling diskAdd storage; enable directoryPerDB; compact with db.runCommand({compact:"collection"})
WT_CACHE_FULLWiredTiger cache undersizedIncrease cacheSizeGB; check for bloated indexes

See Also#

Sources#