When AI scaffolds a backend, it needs somewhere to store data. Sometimes it reaches for a JSONWhat is json?A text format for exchanging data between systems. It uses key-value pairs and arrays, and every programming language can read and write it. file, sometimes it sets up a full database. Understanding why databases exist, and when a JSON file falls apart, is how you evaluate whether AI made the right architectural choice. Databases aren't just bigger files; they're fundamentally different tools designed for problems that simple storage can't solve.
fs.writeFileSync('data.json', ...) for persistence, especially in quick prototypes. This works for a demo but silently breaks under any real load. If AI generated a file-based storage layer, that's your cue to evaluate whether the project needs a real database.The JSONWhat is json?A text format for exchanging data between systems. It uses key-value pairs and arrays, and every programming language can read and write it. file problem
Imagine you're building a user registration system. Your first instinct might be:
// Simple approach that doesn't scale
const users = JSON.parse(readFileSync('users.json'));
users.push(newUser);
writeFileSync('users.json', JSON.stringify(users));This works for a demo, but falls apart in production. Here's why:
| Problem | What happens |
|---|---|
| Concurrency | Two users register at the same time → one overwrites the other |
| Performance | Loading the entire file to find one user → gets slow with 1M+ users |
| Search | Finding a user by email → must scan entire file every time |
| Integrity | Nothing prevents two users from having the same email address |
| Security | File permissions are hard to get right → data can leak |
What databases give you
Persistence
Data in a database survives anything short of a catastrophic disk failure. Restart your server? Data is still there. Power outage? Data is still there. Your app crashes? Data is still there.
This sounds obvious, but it's crucial. When a user clicks "save," they expect their data to be there tomorrow. Databases make that guarantee.
ConcurrencyWhat is concurrency?The ability of a program to handle multiple tasks at the same time, like serving thousands of users without slowing down.
Databases use sophisticated locking mechanisms to handle thousands of simultaneous users. When two people try to book the last seat on a flight, the database ensures only one succeeds. When a hundred users read an article while an admin edits it, everyone sees consistent data.
Performance
Databases are optimized for speed:
Indexing creates lookup tables that find data in milliseconds, even with millions of rows. It's like the indexWhat is index?A data structure the database maintains alongside a table so it can find rows by specific columns quickly instead of scanning everything. at the back of a textbook, instead of reading every page, you jump straight to what you need.
Query optimization means the database figures out the fastest way to answer your question. You ask "find users who joined last month," and the database chooses the best approach automatically.
Caching keeps frequently-used data in memory so it doesn't need to be read from disk.
Data integrity
Databases enforce rules about your data:
- Constraints ensure every user has a unique email
- Foreign keys prevent you from deleting a product that still has orders
- Data types reject attempts to store text in a number field
- Transactions ensure that money transfers completely succeed or completely fail, never half-complete
The ACIDWhat is acid?Four guarantees a database makes about transactions: changes are all-or-nothing, data stays valid, concurrent users don't interfere, and saved data survives crashes. guarantee
Relational databases promiseWhat is promise?An object that represents a value you don't have yet but will get in the future, letting your code keep running while it waits. ACID properties, the gold standard for data reliability:
Atomicity: A transactionWhat is transaction?A group of database operations that either all succeed together or all fail together, preventing partial updates. is all-or-nothing. If you're transferring money between accounts, either both the debit and credit happen, or neither does. No partial updates allowed.
Consistency: The database only moves between valid states. If you have a rule that "account balance cannot be negative," the database enforces it, even during complex operations.
Isolation: Concurrent transactions don't interfere with each other. Two people booking the same hotel room won't both get confirmed, the database ensures one goes first.
Durability: Once data is committed, it stays committed. Even if the power cuts out immediately after, your data is safe.
Types of databases
Not all databases work the same way. The two main categories are:
SQLWhat is sql?A language for querying and managing data in relational databases, letting you insert, read, update, and delete rows across tables. (Relational) databases
Store data in tables with predefined columns and relationships. Think Excel spreadsheets that can reference each other.
Examples: PostgreSQL, MySQL, SQLite, SQL Server
Best for: Structured data, business applications, financial systems
Language: SQL (Structured Query Language)
NoSQLWhat is nosql?A category of databases that store data without fixed table schemas, using documents, key-value pairs, or graphs instead of rows and columns. databases
More flexible approaches for specific use cases:
Document stores (MongoDB): Store JSONWhat is json?A text format for exchanging data between systems. It uses key-value pairs and arrays, and every programming language can read and write it.-like documents, good for flexible schemas
Key-value stores (Redis): Ultra-fast lookups, perfect for caching
Wide-column (Cassandra): Massive scale distributed data
Graph databases (Neo4j): Complex relationships like social networks
Typical application architecture
Real apps often use multiple databases for different jobs:
┌─────────────────┐
│ Web Client │
│ (React/Vue) │
└────────┬────────┘
│ HTTP/HTTPS
┌────────▼────────┐
│ API Server │
│ (Node.js) │
└────────┬────────┘
│
┌────┴────┬──────────┐
│ │ │
┌───▼───┐ ┌──▼───┐ ┌────▼────┐
│Postgre│ │Redis ││ S3 │
│ SQL │ │Cache ││ (Files)│
└───────┘ └──────┘ └─────────┘- PostgreSQL: Stores main application data (users, orders, products)
- Redis: Caches frequent queries, stores sessions, handles rate limitingWhat is rate limiting?Restricting how many requests a client can make within a time window. Prevents brute-force attacks and protects your API from being overwhelmed.
- S3 (or similar): Stores files like images and documents
Quick reference
| Need | Solution | Example |
|---|---|---|
| Simple config | JSON file | App settings |
| Local development | SQLite | Single-file database |
| Production web app | PostgreSQL | Most modern applications |
| Legacy PHP sites | MySQL | WordPress, shared hosting |
| Flexible prototyping | MongoDB | Rapidly evolving schemas |
| Speed caching | Redis | Sessions, real-time data |
// This is what happens when you visit a website:
// Without a database (JSON file approach)
// ❌ Slow: Must read entire file
// ❌ Unsafe: Race conditions on write
// ❌ Limited: No complex queries
const users = JSON.parse(fs.readFileSync('users.json'));
const user = users.find(u => u.id === userId);
// With a database
// ✅ Fast: Indexed lookup
// ✅ Safe: Handles concurrent access
// ✅ Powerful: Complex queries
const user = await db.query(
'SELECT * FROM users WHERE id = $1',
[userId]
);