Document-oriented NoSQL database designed for high-volume data storage with flexible schemas, horizontal scaling through sharding, and built-in replication for high availability.
Addresses below are RFC 5737 documentation ranges or placeholders - swap in your own.
Table of Contents#
- Overview
- Installation
- Configuration
- Authentication and Security
- Basic Operations
- Indexing Strategies
- Aggregation Pipeline
- Replica Set Configuration
- Sharding Setup
- Backup and Restore
- Performance Tuning
- Monitoring
- Troubleshooting
- See Also
- Sources
1. Overview#
MongoDB stores data as flexible, JSON-like documents (BSON format) rather than rows and columns. Key characteristics:
- Schema-less - documents in the same collection can have different fields
- Rich query language - supports field queries, range queries, regex, and geospatial queries
- Horizontal scaling - automatic sharding distributes data across multiple servers
- Replica sets - automatic failover with configurable read preferences
- Aggregation framework - pipeline-based data processing within the database
- Change streams - real-time notifications on data changes
Architecture Components#
| Component | Purpose |
|---|---|
mongod | Primary database process (data storage) |
mongos | Query router for sharded clusters |
mongosh | Interactive JavaScript shell (replaces legacy mongo) |
| Config servers | Store metadata and routing for sharded clusters |
2. Installation#
2.1 Debian/Ubuntu#
# Import GPG key
curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | \
sudo gpg -o /usr/share/keyrings/mongodb-server-7.0.gpg --dearmor
# Add repository
echo "deb [signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg] \
http://repo.mongodb.org/apt/debian $(lsb_release -sc)/mongodb-org/7.0 main" | \
sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list
sudo apt update
sudo apt install -y mongodb-org2.2 RHEL/Rocky/Alma#
cat > /etc/yum.repos.d/mongodb-org-7.0.repo <<'EOF'
[mongodb-org-7.0]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/7.0/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://pgp.mongodb.com/server-7.0.asc
EOF
sudo dnf install -y mongodb-org2.3 Arch Linux#
# From AUR
yay -S mongodb-bin mongosh-bin2.4 Enable and Start#
sudo systemctl enable --now mongod
sudo systemctl status mongod3. Configuration#
The primary configuration file is /etc/mongod.conf (YAML format):
# Storage
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
wiredTiger:
engineConfig:
cacheSizeGB: 2 # Default: 50% of (RAM - 1GB)
# Networking
net:
port: 27017
bindIp: 127.0.0.1 # Add node IPs for cluster access
maxIncomingConnections: 65536
# Security
security:
authorization: disabled # Enable after creating admin user
# Logging
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
logRotate: reopen
# Process management
processManagement:
timeZoneInfo: /usr/share/zoneinfoAfter changes:
sudo systemctl restart mongod4. Authentication and Security#
4.1 Create Administrative User#
Start with authorization disabled, then create an admin:
mongoshuse admin
db.createUser({
user: "admin",
pwd: passwordPrompt(),
roles: [
{ role: "userAdminAnyDatabase", db: "admin" },
{ role: "readWriteAnyDatabase", db: "admin" },
{ role: "clusterAdmin", db: "admin" }
]
})4.2 Enable Authorization#
Edit /etc/mongod.conf:
security:
authorization: enabledsudo systemctl restart mongodConnect with credentials:
mongosh -u admin -p --authenticationDatabase admin4.3 Create Application Users#
use appdb
db.createUser({
user: "appuser",
pwd: passwordPrompt(),
roles: [
{ role: "readWrite", db: "appdb" }
]
})4.4 Built-in Roles#
| Role | Scope | Permissions |
|---|---|---|
read | Database | Read all non-system collections |
readWrite | Database | Read and write |
dbAdmin | Database | Schema management, indexing, stats |
userAdmin | Database | Create and manage users |
clusterAdmin | Cluster | Manage replica sets and sharding |
root | All | Superuser access |
4.5 TLS/SSL Configuration#
net:
tls:
mode: requireTLS
certificateKeyFile: /etc/ssl/mongodb/server.pem
CAFile: /etc/ssl/mongodb/ca.pem# Connect with TLS
mongosh --tls --tlsCAFile /etc/ssl/mongodb/ca.pem \
--tlsCertificateKeyFile /etc/ssl/mongodb/client.pem4.6 Network Hardening#
# Firewall: only allow MongoDB from app servers
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.0.2.0/24" port port="27017" protocol="tcp" accept'
firewall-cmd --reload5. Basic Operations#
5.1 Database and Collection Management#
// Show databases
show dbs
// Switch to (or create) a database
use myapp
// Show collections
show collections
// Create a collection with options
db.createCollection("logs", {
capped: true,
size: 1073741824, // 1 GB
max: 1000000 // Max documents
})
// Drop a collection
db.logs.drop()
// Drop a database
db.dropDatabase()5.2 CRUD Operations#
// INSERT
db.users.insertOne({ name: "Alice", email: "alice@example.com", age: 30 })
db.users.insertMany([
{ name: "Bob", email: "bob@example.com", age: 25 },
{ name: "Carol", email: "carol@example.com", age: 35 }
])
// FIND
db.users.findOne({ name: "Alice" })
db.users.find({ age: { $gte: 25 } }).sort({ age: 1 }).limit(10)
db.users.find(
{ age: { $gte: 25, $lte: 35 } },
{ name: 1, email: 1, _id: 0 } // Projection
)
// UPDATE
db.users.updateOne(
{ name: "Alice" },
{ $set: { age: 31 }, $currentDate: { lastModified: true } }
)
db.users.updateMany(
{ age: { $lt: 30 } },
{ $set: { status: "young" } }
)
// DELETE
db.users.deleteOne({ name: "Alice" })
db.users.deleteMany({ status: "inactive" })5.3 Query Operators#
| Operator | Example | Description |
|---|---|---|
$eq | { age: { $eq: 30 } } | Equals |
$gt, $gte | { age: { $gt: 25 } } | Greater than (or equal) |
$lt, $lte | { age: { $lt: 35 } } | Less than (or equal) |
$in | { status: { $in: ["A","B"] } } | Matches any in array |
$and | { $and: [{age: {$gt:25}}, {status:"A"}] } | Logical AND |
$or | { $or: [{age: 25}, {age: 30}] } | Logical OR |
$exists | { email: { $exists: true } } | Field exists |
$regex | { name: { $regex: /^A/i } } | Pattern match |
6. Indexing Strategies#
6.1 Index Types#
// Single field index
db.users.createIndex({ email: 1 }) // 1=ascending, -1=descending
// Compound index (order matters for query optimization)
db.orders.createIndex({ customerId: 1, orderDate: -1 })
// Unique index
db.users.createIndex({ email: 1 }, { unique: true })
// TTL index (auto-expire documents)
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 })
// Text index (full-text search)
db.articles.createIndex({ title: "text", body: "text" })
// Geospatial index
db.locations.createIndex({ coordinates: "2dsphere" })
// Partial index (index subset of documents)
db.orders.createIndex(
{ status: 1 },
{ partialFilterExpression: { status: { $eq: "pending" } } }
)6.2 Index Management#
// List all indexes on a collection
db.users.getIndexes()
// Explain a query (check if index is used)
db.users.find({ email: "alice@example.com" }).explain("executionStats")
// Drop an index
db.users.dropIndex("email_1")
// Hide an index (test impact without dropping)
db.users.hideIndex("email_1")
db.users.unhideIndex("email_1")6.3 Index Best Practices#
- Index fields that appear in
find(),sort(), and$matchstages - Compound indexes follow the ESR rule: Equality, Sort, Range
- Avoid indexes on fields with low cardinality (for example, boolean fields)
- Monitor index size with
db.collection.stats().indexSizes - Use
explain()to verify index usage before and after changes
7. Aggregation Pipeline#
The aggregation pipeline processes documents through a sequence of stages:
7.1 Common Stages#
// $match - filter documents (like find)
// $group - aggregate values
// $sort - order results
// $project - reshape documents
// $lookup - left outer join
// $unwind - deconstruct arrays
// $limit / $skip - pagination
db.orders.aggregate([
// Stage 1: Filter
{ $match: { status: "completed", orderDate: { $gte: ISODate("2026-01-01") } } },
// Stage 2: Group by customer
{ $group: {
_id: "$customerId",
totalSpent: { $sum: "$amount" },
orderCount: { $count: {} },
avgOrder: { $avg: "$amount" }
}},
// Stage 3: Sort by total spent
{ $sort: { totalSpent: -1 } },
// Stage 4: Top 10
{ $limit: 10 }
])7.2 $lookup (Join)#
db.orders.aggregate([
{ $lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customer"
}},
{ $unwind: "$customer" },
{ $project: {
orderDate: 1,
amount: 1,
"customer.name": 1,
"customer.email": 1
}}
])7.3 $bucket (Histogram)#
db.users.aggregate([
{ $bucket: {
groupBy: "$age",
boundaries: [0, 18, 30, 45, 60, 120],
default: "Other",
output: {
count: { $sum: 1 },
names: { $push: "$name" }
}
}}
])8. Replica Set Configuration#
A replica set provides automatic failover with a minimum of 3 members (1 primary, 2 secondaries).
8.1 Configure Each Member#
On each node, edit /etc/mongod.conf:
net:
bindIp: 0.0.0.0
port: 27017
replication:
replSetName: "rs0"
security:
keyFile: /etc/mongodb/keyfile # Shared secret for inter-member auth8.2 Create Keyfile#
openssl rand -base64 756 > /etc/mongodb/keyfile
chmod 400 /etc/mongodb/keyfile
chown mongodb:mongodb /etc/mongodb/keyfile
# Copy the SAME keyfile to all replica set members8.3 Initialize the Replica Set#
Connect to one member and initiate:
mongosh
rs.initiate({
_id: "rs0",
members: [
{ _id: 0, host: "192.0.2.11:27017", priority: 2 },
{ _id: 1, host: "192.0.2.12:27017", priority: 1 },
{ _id: 2, host: "192.0.2.13:27017", priority: 1 }
]
})8.4 Verify Replica Set#
rs.status() // Detailed status
rs.conf() // Current configuration
rs.isMaster() // Which node is primary
db.printReplicationInfo() // Oplog window
db.printSecondaryReplicationInfo() // Replication lag
8.5 Read Preferences#
// Read from nearest member (reduces latency)
db.users.find().readPref("nearest")
// Available preferences:
// primary - default, all reads from primary
// primaryPreferred - primary, fallback to secondary
// secondary - always read from secondary
// secondaryPreferred - secondary, fallback to primary
// nearest - lowest network latency member
8.6 Arbiter (When Only 2 Data Nodes Available)#
// Add an arbiter (votes but holds no data)
rs.addArb("192.0.2.14:27017")9. Sharding Setup#
Sharding distributes data across multiple replica sets for horizontal scaling.
9.1 Architecture#
Application
|
mongos (query router)
|
Config servers (3-member replica set)
|
+---------+---------+
| Shard 1 | Shard 2 | ... (each shard is a replica set)
+---------+---------+9.2 Deploy Config Servers#
On each config server, set /etc/mongod.conf:
sharding:
clusterRole: configsvr
replication:
replSetName: "configRS"
net:
port: 27019
bindIp: 0.0.0.0Initialize:
rs.initiate({
_id: "configRS",
configsvr: true,
members: [
{ _id: 0, host: "192.0.2.21:27019" },
{ _id: 1, host: "192.0.2.22:27019" },
{ _id: 2, host: "192.0.2.23:27019" }
]
})9.3 Deploy Shard Replica Sets#
Each shard is a standard replica set configured with shardsvr:
sharding:
clusterRole: shardsvr
replication:
replSetName: "shard1RS"
net:
port: 27018
bindIp: 0.0.0.09.4 Start mongos Router#
mongos --configdb "configRS/192.0.2.21:27019,192.0.2.22:27019,192.0.2.23:27019" \
--bind_ip 0.0.0.0 --port 270179.5 Add Shards and Enable Sharding#
Connect to mongos:
// Add each shard replica set
sh.addShard("shard1RS/192.0.2.31:27018,192.0.2.32:27018,192.0.2.33:27018")
sh.addShard("shard2RS/192.0.2.41:27018,192.0.2.42:27018,192.0.2.43:27018")
// Enable sharding on a database
sh.enableSharding("myapp")
// Shard a collection (choose shard key carefully)
sh.shardCollection("myapp.orders", { customerId: "hashed" })
// Or range-based:
sh.shardCollection("myapp.logs", { timestamp: 1 })
// Verify
sh.status()9.6 Shard Key Selection#
| Strategy | Use Case | Pros | Cons |
|---|---|---|---|
| Hashed | Even distribution needed | Uniform writes | No range queries on key |
| Range | Time-series, range queries | Efficient range scans | Hot spots on recent data |
| Compound | Mixed query patterns | Balanced reads/writes | More complex to design |
10. Backup and Restore#
10.1 mongodump / mongorestore#
# Full backup (all databases)
mongodump --uri="mongodb://<user>:<pass>@<host>:27017" \
--out=/backup/mongo/$(date +%Y%m%d)
# Single database
mongodump --db=<database> --out=/backup/mongo/$(date +%Y%m%d)
# Single collection
mongodump --db=<database> --collection=<collection> --out=/backup/
# Compressed backup
mongodump --gzip --archive=/backup/mongo-$(date +%Y%m%d).gz
# Restore full backup
mongorestore --drop /backup/mongo/20260322/
# Restore single database
mongorestore --db=<database> --drop /backup/mongo/20260322/<database>/
# Restore from compressed archive
mongorestore --gzip --archive=/backup/mongo-20260322.gz --drop10.2 Filesystem Snapshot (WiredTiger)#
For large databases, filesystem snapshots are faster:
# Lock writes
mongosh --eval 'db.fsyncLock()'
# Take LVM or ZFS snapshot
lvcreate -L 10G -s -n mongo-snap /dev/data/mongodb
# Unlock
mongosh --eval 'db.fsyncUnlock()'10.3 Automated Backup Script#
#!/bin/bash
# /usr/local/bin/mongo-backup.sh
BACKUP_DIR="/backup/mongodb"
RETENTION_DAYS=14
DATE=$(date +%Y%m%d-%H%M%S)
mongodump --gzip --archive="${BACKUP_DIR}/mongo-${DATE}.gz" \
--uri="mongodb://backupuser:<password>@localhost:27017/?authSource=admin"
if [ $? -eq 0 ]; then
echo "Backup successful: ${BACKUP_DIR}/mongo-${DATE}.gz"
find "${BACKUP_DIR}" -name "*.gz" -mtime +${RETENTION_DAYS} -delete
else
echo "Backup FAILED" >&2
exit 1
fi10.4 Continuous Backup with Oplog#
# Dump with oplog for point-in-time recovery
mongodump --oplog --out=/backup/mongo/$(date +%Y%m%d)
# Restore with oplog replay
mongorestore --oplogReplay /backup/mongo/20260322/11. Performance Tuning#
11.1 WiredTiger Cache#
storage:
wiredTiger:
engineConfig:
# Default: 50% of (RAM - 1GB), min 256MB
# Set explicitly for predictable behavior
cacheSizeGB: 411.2 Connection Pool#
net:
maxIncomingConnections: 65536
# Application-side: configure connection pool
# Example (Node.js driver):
# MongoClient.connect(uri, { maxPoolSize: 50, minPoolSize: 10 })11.3 Read/Write Concerns#
// Write concern: acknowledge after majority replication
db.orders.insertOne(
{ item: "widget" },
{ writeConcern: { w: "majority", wtimeout: 5000 } }
)
// Read concern: only return data committed to majority
db.orders.find().readConcern("majority")11.4 Profiler#
// Enable profiling for slow operations (> 100ms)
db.setProfilingLevel(1, { slowms: 100 })
// View slow queries
db.system.profile.find().sort({ ts: -1 }).limit(10).pretty()
// Disable profiling
db.setProfilingLevel(0)12. Monitoring#
12.1 Built-in Commands#
// Server status overview
db.serverStatus()
// Current operations
db.currentOp()
// Collection statistics
db.orders.stats()
// Database statistics
db.stats()12.2 mongotop and mongostat#
# Real-time read/write per collection
mongotop --uri="mongodb://admin:<pass>@localhost:27017/?authSource=admin" 5
# Server metrics (inserts/queries/updates/deletes per second)
mongostat --uri="mongodb://admin:<pass>@localhost:27017/?authSource=admin" 512.3 Key Metrics to Watch#
| Metric | Source | Healthy Value |
|---|---|---|
| Cache hit ratio | db.serverStatus().wiredTiger.cache | > 95% |
| Replication lag | rs.printSecondaryReplicationInfo() | < 10 seconds |
| Connections in use | db.serverStatus().connections.current | Well below maxIncomingConnections |
| Page faults | db.serverStatus().extra_info.page_faults | Low and stable |
| Opcounters | db.serverStatus().opcounters | Consistent with workload |
| Ticket available | db.serverStatus().wiredTiger.concurrentTransactions | > 0 |
13. Troubleshooting#
| Issue | Cause | Solution |
|---|---|---|
MongoServerError: Authentication failed | Wrong credentials or auth database | Use --authenticationDatabase admin; verify user with db.getUsers() |
| Replica set election loops | Network flapping or clock skew | Check NTP sync; verify network between members; review rs.status() |
| Secondary falling behind | Oplog too small or heavy write load | Increase oplog size (replSetResizeOplog); check disk I/O on secondary |
too many open files | ulimit too low for connection count | Set LimitNOFILE=65536 in systemd unit; verify with ulimit -n |
| Slow queries | Missing indexes or full collection scans | Enable profiler; check explain() for COLLSCAN; add appropriate indexes |
| Sharded cluster imbalance | Poor shard key choice | Monitor chunk distribution with sh.status(); consider resharding |
| Out of disk space | Journal + data + oplog filling disk | Add storage; enable directoryPerDB; compact with db.runCommand({compact:"collection"}) |
WT_CACHE_FULL | WiredTiger cache undersized | Increase cacheSizeGB; check for bloated indexes |