Sensitive Data Exposure Code Review Guide
Table of Contents
1. Introduction to Sensitive Data Exposure
Sensitive data exposure occurs when an application inadvertently reveals protected information — personally identifiable information (PII), credentials, financial data, health records, or internal system details. Unlike injection or authentication attacks where an attacker actively exploits a flaw, data exposure often happens passively: the application simply includes too much information in logs, API responses, error messages, or URLs.
The #1 Source of Data Breaches
Sensitive data exposure consistently ranks among the most common and costly vulnerability classes. OWASP places it in the Top 10 as "Cryptographic Failures" (A02:2021). According to IBM's Cost of a Data Breach Report, the average breach costs $4.45M, and exposed PII is the most common — and most expensive — type of data compromised. Most of these breaches are not the result of sophisticated attacks; they stem from simple coding mistakes: logging passwords, returning too much data in API responses, or storing secrets in plaintext.
In this guide, you'll learn how to classify and identify sensitive data across different regulatory frameworks, how PII leaks through logs, error messages, and monitoring systems, why API over-exposure is one of the most pervasive data leak vectors, how insecure storage practices — from plaintext passwords to unencrypted backups — lead to breaches, and how to build systematic defenses at every stage of the data lifecycle.
Where Sensitive Data Leaks
Data Flow Through an Application
Which of these is the most common way sensitive data gets exposed in production applications?
2. Real-World Scenario
The Scenario: You're reviewing a healthcare SaaS application that manages patient appointments and medical records. The application has a REST API consumed by a web frontend and a mobile app.
Healthcare API — Multiple Data Exposure Vulnerabilities
1// --- Patient Appointment Endpoint ---
2app.get('/api/appointments/:id', authenticate, async (req, res) => {
3 const appointment = await db.query(
4 'SELECT * FROM appointments WHERE id = $1', [req.params.id]
5 );
6
7 // ❌ Returns ENTIRE database row including:
8 // - patient SSN
9 // - insurance policy number
10 // - internal notes from doctor
11 // - billing codes
12 // - next_of_kin contact info
13 res.json(appointment);
14});
15
16// --- User Login Endpoint ---
17app.post('/api/auth/login', async (req, res) => {
18 const { email, password } = req.body;
19
20 // ❌ Logging the full request body — includes password!
21 logger.info('Login attempt', { body: req.body });
22
23 const user = await db.query(
24 'SELECT * FROM users WHERE email = $1', [email]
25 );
26
27 if (!user || !bcrypt.compare(password, user.password_hash)) {
28 // ❌ Different error messages leak whether the email exists
29 if (!user) {
30 return res.status(401).json({ error: 'Email not found' });
31 }
32 return res.status(401).json({ error: 'Incorrect password' });
33 }
34
35 // ❌ Token includes sensitive data in the payload
36 const token = jwt.sign({
37 userId: user.id,
38 email: user.email,
39 ssn: user.ssn, // ❌ SSN in JWT!
40 role: user.role,
41 salary: user.salary, // ❌ Salary in JWT!
42 }, SECRET);
43
44 res.json({ token });
45});
46
47// --- Error Handler ---
48app.use((err, req, res, next) => {
49 // ❌ Stack trace with internal paths sent to client
50 // ❌ Database connection info in error details
51 console.error('Unhandled error:', err);
52 res.status(500).json({
53 error: err.message,
54 stack: err.stack, // ❌ Full stack trace!
55 query: err.sql, // ❌ SQL query that failed!
56 connectionString: err.host, // ❌ DB host exposed!
57 });
58});Six Vulnerabilities in One File
This code contains: 1) API over-exposure (SELECT * returns all columns including SSN), 2) Password logging (request body logged on login), 3) Account enumeration (different error messages for 'email not found' vs 'wrong password'), 4) Sensitive data in JWT (SSN and salary in token payload — JWTs are base64-encoded, not encrypted), 5) Verbose error responses (stack traces and SQL queries sent to client), 6) Database info leakage (connection host in error response). These are all common patterns found in real production code.
A JWT token contains the user's SSN in its payload. Why is this dangerous even though the JWT is signed?
3. Understanding Sensitive Data
Before you can prevent sensitive data exposure, you need to know what counts as sensitive data. The classification depends on the regulatory framework, industry, and jurisdiction.
Sensitive Data Classification
| Category | Examples | Regulations | Exposure Impact |
|---|---|---|---|
| PII (Personally Identifiable Information) | Name, email, phone, address, date of birth, IP address | GDPR, CCPA, PIPEDA | Identity theft, regulatory fines, reputational damage |
| Authentication Credentials | Passwords, password hashes, API keys, tokens, session IDs | PCI DSS, SOC 2, ISO 27001 | Account takeover, lateral movement, full system compromise |
| Financial Data | Credit card numbers, bank accounts, transaction history, salary | PCI DSS, SOX, GLBA | Financial fraud, regulatory fines up to 4% of revenue |
| Health Information (PHI) | Medical records, diagnoses, prescriptions, insurance IDs | HIPAA, HITECH | Fines up to $1.5M per violation category, criminal penalties |
| Government IDs | SSN, passport number, driver's license, tax ID | Varies by jurisdiction | Identity theft, fraud, very high PII sensitivity |
| System / Infrastructure | DB connection strings, internal IPs, file paths, stack traces | SOC 2, ISO 27001 | Aids further attacks, enables targeted exploitation |
| Biometric Data | Fingerprints, facial recognition data, voice prints, retina scans | GDPR, BIPA, CCPA | Irrevocable — cannot be changed like a password |
The "Combined Data" Problem
Individual data points may not be sensitive alone, but combined they become PII. A zip code + date of birth + gender can uniquely identify 87% of the US population (Sweeney, 2000). During code review, consider not just individual fields but combinations that enable re-identification. This is especially relevant for analytics endpoints, exports, and data sharing with third parties.
Data Classification in Code
1// ✅ Define data sensitivity at the schema level
2
3enum DataSensitivity {
4 PUBLIC = 'public', // Can be freely shared
5 INTERNAL = 'internal', // For internal use only
6 CONFIDENTIAL = 'confidential', // Restricted access
7 RESTRICTED = 'restricted', // Highest sensitivity (PII, PHI)
8}
9
10// ✅ Annotate your data models with sensitivity levels
11interface User {
12 id: string; // INTERNAL
13 displayName: string; // PUBLIC
14 email: string; // CONFIDENTIAL (PII)
15 passwordHash: string; // RESTRICTED (never expose)
16 ssn: string; // RESTRICTED (PII)
17 dateOfBirth: Date; // CONFIDENTIAL (PII)
18 phoneNumber: string; // CONFIDENTIAL (PII)
19 role: string; // INTERNAL
20 loginAttempts: number; // INTERNAL
21 lastIpAddress: string; // CONFIDENTIAL (PII under GDPR)
22 medicalNotes?: string; // RESTRICTED (PHI)
23}
24
25// ✅ Create view models that expose only what's needed
26interface UserPublicProfile {
27 id: string;
28 displayName: string;
29}
30
31interface UserOwnProfile {
32 id: string;
33 displayName: string;
34 email: string; // User can see their own email
35 phoneNumber: string; // User can see their own phone
36 role: string;
37}
38
39// ✅ Map to appropriate view based on context
40function toPublicProfile(user: User): UserPublicProfile {
41 return { id: user.id, displayName: user.displayName };
42}
43
44function toOwnProfile(user: User): UserOwnProfile {
45 return {
46 id: user.id,
47 displayName: user.displayName,
48 email: user.email,
49 phoneNumber: user.phoneNumber,
50 role: user.role,
51 };
52}An analytics API returns: { zipCode: '02139', birthYear: 1985, gender: 'M', purchaseCount: 47 }. Is this sensitive data?
4. PII in Logs & Error Messages
Logging is the single most common vector for sensitive data exposure. Developers log request/response bodies for debugging, error messages include stack traces with variable values, and monitoring tools capture everything. Once sensitive data enters log files, it persists indefinitely and is accessible to anyone with log access — often a much wider group than those with database access.
❌ Vulnerable: Common Logging Patterns
1// Pattern 1: Logging full request bodies
2app.use((req, res, next) => {
3 // ❌ Logs EVERYTHING including passwords, tokens, credit cards
4 logger.info('Incoming request', {
5 method: req.method,
6 url: req.url,
7 body: req.body, // ❌ Could contain password, CC number
8 headers: req.headers, // ❌ Contains Authorization tokens
9 cookies: req.cookies, // ❌ Contains session tokens
10 });
11 next();
12});
13
14// Pattern 2: Logging user objects
15async function updateProfile(userId: string, updates: any) {
16 const user = await User.findById(userId);
17 // ❌ Logs entire user object including passwordHash, SSN
18 logger.info('Updating user profile', { user, updates });
19 Object.assign(user, updates);
20 await user.save();
21}
22
23// Pattern 3: Error logging with context
24async function processPayment(order: Order) {
25 try {
26 await paymentGateway.charge(order.creditCard, order.total);
27 } catch (error) {
28 // ❌ Logs the full order object including credit card details
29 logger.error('Payment failed', {
30 error: error.message,
31 order: order, // ❌ Includes creditCard: { number, cvv, expiry }
32 stack: error.stack,
33 });
34 throw error;
35 }
36}
37
38// Pattern 4: Debug logging left in production
39function authenticateUser(email: string, password: string) {
40 // ❌ Debug line that was never removed
41 console.log(`Auth attempt: email=${email} password=${password}`);
42 // ...
43}✅ Secure: Safe Logging Patterns
1// ✅ Define a list of fields that must NEVER be logged
2const REDACTED_FIELDS = new Set([
3 'password', 'passwordHash', 'password_hash',
4 'ssn', 'socialSecurityNumber',
5 'creditCard', 'cardNumber', 'cvv', 'ccv',
6 'token', 'accessToken', 'refreshToken', 'sessionId',
7 'secret', 'apiKey', 'privateKey',
8 'authorization',
9]);
10
11function sanitizeForLogging(obj: any, depth = 0): any {
12 if (depth > 5) return '[nested]';
13 if (obj === null || obj === undefined) return obj;
14 if (typeof obj !== 'object') return obj;
15
16 if (Array.isArray(obj)) {
17 return obj.map(item => sanitizeForLogging(item, depth + 1));
18 }
19
20 const sanitized: Record<string, any> = {};
21 for (const [key, value] of Object.entries(obj)) {
22 if (REDACTED_FIELDS.has(key.toLowerCase())) {
23 sanitized[key] = '[REDACTED]';
24 } else if (typeof value === 'string' && value.length > 200) {
25 sanitized[key] = value.substring(0, 50) + '...[truncated]';
26 } else {
27 sanitized[key] = sanitizeForLogging(value, depth + 1);
28 }
29 }
30 return sanitized;
31}
32
33// ✅ Safe request logging middleware
34app.use((req, res, next) => {
35 logger.info('Incoming request', {
36 method: req.method,
37 url: req.url,
38 // ✅ Only log safe fields
39 userAgent: req.headers['user-agent'],
40 contentType: req.headers['content-type'],
41 // ✅ Sanitize body before logging
42 body: sanitizeForLogging(req.body),
43 // ✅ Never log headers (contains auth tokens)
44 });
45 next();
46});
47
48// ✅ Safe error logging
49async function processPayment(order: Order) {
50 try {
51 await paymentGateway.charge(order.creditCard, order.total);
52 } catch (error) {
53 logger.error('Payment failed', {
54 error: error.message,
55 // ✅ Only log non-sensitive order fields
56 orderId: order.id,
57 amount: order.total,
58 // ✅ Never log credit card details
59 });
60 throw error;
61 }
62}A developer adds logger.error('Auth failed', { email, password, reason }) to debug login issues. What should you flag in code review?
5. API Over-Exposure
API over-exposure (also called "Excessive Data Exposure" in the OWASP API Security Top 10) occurs when API endpoints return more data than the client needs. The most common pattern is SELECT * from the database piped directly into the API response, relying on the frontend to show only the relevant fields.
❌ Vulnerable: API Over-Exposure Patterns
1// Pattern 1: SELECT * → response
2app.get('/api/users/:id', async (req, res) => {
3 // ❌ Fetches ALL columns including passwordHash, SSN, etc.
4 const user = await db.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
5 // ❌ Returns entire row to the client
6 res.json(user);
7 // Client receives: { id, name, email, passwordHash, ssn,
8 // salary, internalNotes, loginAttempts, ... }
9});
10
11// Pattern 2: ORM includes related data by default
12app.get('/api/orders/:id', async (req, res) => {
13 const order = await Order.findById(req.params.id)
14 .populate('customer') // ❌ Includes full customer object
15 .populate('payments') // ❌ Includes payment card details
16 .populate('internalNotes'); // ❌ Includes staff comments
17 res.json(order);
18});
19
20// Pattern 3: List endpoint returns too much
21app.get('/api/users', async (req, res) => {
22 // ❌ Returns ALL user fields for ALL users
23 const users = await User.find({});
24 res.json(users);
25 // An attacker with a valid session can enumerate
26 // every user's email, phone, role, etc.
27});
28
29// Pattern 4: Search leaks data
30app.get('/api/search', async (req, res) => {
31 const results = await db.query(
32 "SELECT * FROM products WHERE name LIKE $1",
33 [`%${req.query.q}%`]
34 );
35 // ❌ Returns internal fields: cost_price, supplier_id,
36 // margin_percentage, internal_sku, warehouse_location
37 res.json(results);
38});✅ Secure: Explicit Field Selection
1// ✅ Define response schemas for each endpoint
2interface UserListResponse {
3 id: string;
4 displayName: string;
5 avatarUrl: string;
6}
7
8interface UserDetailResponse {
9 id: string;
10 displayName: string;
11 email: string; // Only for the user themselves
12 avatarUrl: string;
13 memberSince: string;
14}
15
16// ✅ Select only needed columns
17app.get('/api/users/:id', authenticate, async (req, res) => {
18 const isOwnProfile = req.user.id === req.params.id;
19
20 // ✅ Different fields based on who is requesting
21 const fields = isOwnProfile
22 ? ['id', 'display_name', 'email', 'avatar_url', 'created_at']
23 : ['id', 'display_name', 'avatar_url', 'created_at'];
24
25 const user = await db.query(
26 `SELECT ${fields.join(', ')} FROM users WHERE id = $1`,
27 [req.params.id]
28 );
29
30 if (!user) return res.status(404).json({ error: 'Not found' });
31
32 // ✅ Map to response DTO (Data Transfer Object)
33 const response: UserDetailResponse = {
34 id: user.id,
35 displayName: user.display_name,
36 ...(isOwnProfile && { email: user.email }),
37 avatarUrl: user.avatar_url,
38 memberSince: user.created_at,
39 };
40
41 res.json(response);
42});
43
44// ✅ List endpoint with minimal fields
45app.get('/api/users', authenticate, async (req, res) => {
46 // ✅ Only return public profile fields
47 const users = await db.query(
48 'SELECT id, display_name, avatar_url FROM users LIMIT $1 OFFSET $2',
49 [Math.min(req.query.limit || 20, 100), req.query.offset || 0]
50 );
51
52 const response: UserListResponse[] = users.map(u => ({
53 id: u.id,
54 displayName: u.display_name,
55 avatarUrl: u.avatar_url,
56 }));
57
58 res.json({ data: response, total: users.length });
59});Your API returns full user objects from the database, and the frontend only displays name and avatar. A QA engineer says 'it works fine.' What do you say in code review?
6. Insecure Data Storage
How data is stored determines how much damage occurs when (not if) an attacker gains database access. Plaintext passwords, unencrypted PII, and overly broad data retention multiply the impact of every data breach.
❌ Vulnerable: Insecure Storage Patterns
1// Pattern 1: Plaintext passwords
2async function createUser(email: string, password: string) {
3 // ❌ Storing password in plaintext!
4 await db.query(
5 'INSERT INTO users (email, password) VALUES ($1, $2)',
6 [email, password]
7 );
8}
9
10// Pattern 2: Weak hashing
11const crypto = require('crypto');
12function hashPassword(password: string) {
13 // ❌ MD5 is broken — rainbow tables can reverse it instantly
14 return crypto.createHash('md5').update(password).digest('hex');
15 // ❌ SHA-256 without salt is also insufficient
16 // return crypto.createHash('sha256').update(password).digest('hex');
17}
18
19// Pattern 3: PII stored in plaintext
20async function savePatientRecord(patient: any) {
21 // ❌ SSN, medical data stored without encryption
22 await db.query(
23 'INSERT INTO patients (name, ssn, diagnosis, insurance_id) VALUES ($1, $2, $3, $4)',
24 [patient.name, patient.ssn, patient.diagnosis, patient.insuranceId]
25 );
26}
27
28// Pattern 4: Sensitive data in client-side storage
29function rememberUser(user: any) {
30 // ❌ Storing sensitive data in localStorage (accessible to any JS)
31 localStorage.setItem('user', JSON.stringify({
32 id: user.id,
33 email: user.email,
34 ssn: user.ssn, // ❌ SSN in localStorage!
35 authToken: user.token, // ❌ Token in localStorage (XSS can steal it)
36 }));
37}
38
39// Pattern 5: No data retention limits
40// ❌ User data kept forever, even for deleted accounts
41// ❌ Logs with PII retained indefinitely
42// ❌ Database backups contain plaintext PII with no expiry✅ Secure: Proper Data Storage
1import bcrypt from 'bcrypt';
2
3// ✅ Password hashing with bcrypt (adaptive, salted)
4async function createUser(email: string, password: string) {
5 // ✅ bcrypt with cost factor 12 (auto-generates salt)
6 const passwordHash = await bcrypt.hash(password, 12);
7 await db.query(
8 'INSERT INTO users (email, password_hash) VALUES ($1, $2)',
9 [email, passwordHash]
10 );
11 // ✅ Original password is never stored anywhere
12}
13
14// ✅ Application-level encryption for PII
15import { createCipheriv, createDecipheriv, randomBytes } from 'crypto';
16
17class FieldEncryption {
18 private algorithm = 'aes-256-gcm';
19 private key: Buffer;
20
21 constructor(encryptionKey: string) {
22 this.key = Buffer.from(encryptionKey, 'hex');
23 }
24
25 encrypt(plaintext: string): string {
26 const iv = randomBytes(16);
27 const cipher = createCipheriv(this.algorithm, this.key, iv);
28 let encrypted = cipher.update(plaintext, 'utf8', 'hex');
29 encrypted += cipher.final('hex');
30 const tag = cipher.getAuthTag().toString('hex');
31 // ✅ Return IV + tag + ciphertext (all needed for decryption)
32 return iv.toString('hex') + ':' + tag + ':' + encrypted;
33 }
34
35 decrypt(ciphertext: string): string {
36 const [ivHex, tagHex, encrypted] = ciphertext.split(':');
37 const iv = Buffer.from(ivHex, 'hex');
38 const tag = Buffer.from(tagHex, 'hex');
39 const decipher = createDecipheriv(this.algorithm, this.key, iv);
40 decipher.setAuthTag(tag);
41 let decrypted = decipher.update(encrypted, 'hex', 'utf8');
42 decrypted += decipher.final('utf8');
43 return decrypted;
44 }
45}
46
47// ✅ Encrypt sensitive fields before storage
48const fieldEncryption = new FieldEncryption(process.env.FIELD_ENCRYPTION_KEY!);
49
50async function savePatientRecord(patient: PatientInput) {
51 await db.query(
52 'INSERT INTO patients (name_encrypted, ssn_encrypted, diagnosis_encrypted) VALUES ($1, $2, $3)',
53 [
54 fieldEncryption.encrypt(patient.name),
55 fieldEncryption.encrypt(patient.ssn),
56 fieldEncryption.encrypt(patient.diagnosis),
57 ]
58 );
59}A developer uses SHA-256 to hash passwords before storing them. Is this secure?
7. Prevention Techniques
Defense-in-Depth for Data Protection
1) Classify: Know what data is sensitive and label it in your schemas. 2) Minimize: Collect and return only what you need. 3) Encrypt: At rest (AES-256) and in transit (TLS 1.3). 4) Log Safely: Redact sensitive fields from all log output. 5) Respond Safely: Use DTOs/view models — never return raw database objects. 6) Handle Errors Safely: Generic messages to users, detailed logs internally. 7) Retain Minimally: Delete data when it's no longer needed. 8) Audit: Monitor for data access anomalies.
✅ Comprehensive Data Protection Middleware
1// ✅ Response sanitization middleware
2function responseSanitizer() {
3 return (req: Request, res: Response, next: NextFunction) => {
4 const originalJson = res.json.bind(res);
5
6 res.json = (body: any) => {
7 // ✅ Strip sensitive fields from ANY response
8 const sanitized = deepOmit(body, [
9 'passwordHash', 'password_hash', 'password',
10 'ssn', 'socialSecurityNumber',
11 'creditCardNumber', 'cvv',
12 'internalNotes', 'internal_notes',
13 'costPrice', 'cost_price', 'margin',
14 ]);
15
16 // ✅ Strip internal database fields
17 const cleaned = deepOmit(sanitized, [
18 '__v', '_id', 'createdBy', 'updatedBy',
19 'deletedAt', 'is_deleted',
20 ]);
21
22 return originalJson(cleaned);
23 };
24
25 next();
26 };
27}
28
29// ✅ Security headers for data protection
30app.use((req, res, next) => {
31 // ✅ Prevent browser from caching sensitive responses
32 res.setHeader('Cache-Control', 'no-store, no-cache, must-revalidate');
33 res.setHeader('Pragma', 'no-cache');
34
35 // ✅ Prevent MIME type sniffing
36 res.setHeader('X-Content-Type-Options', 'nosniff');
37
38 // ✅ Strict Transport Security
39 res.setHeader('Strict-Transport-Security', 'max-age=63072000; includeSubDomains');
40
41 next();
42});
43
44// ✅ Data retention enforcement
45async function enforceRetentionPolicy() {
46 // ✅ Delete accounts inactive for > 2 years
47 await db.query(
48 "DELETE FROM users WHERE last_active < NOW() - INTERVAL '2 years' AND status = 'inactive'"
49 );
50
51 // ✅ Purge old logs with PII
52 await db.query(
53 "DELETE FROM audit_logs WHERE created_at < NOW() - INTERVAL '90 days'"
54 );
55
56 // ✅ Anonymize old order data (keep for analytics, remove PII)
57 await db.query(`
58 UPDATE orders SET
59 customer_name = 'ANONYMIZED',
60 customer_email = 'ANONYMIZED',
61 shipping_address = 'ANONYMIZED'
62 WHERE created_at < NOW() - INTERVAL '3 years'
63 `);
64}✅ Secure Error Handling
1// ✅ Error handler that never leaks internals
2app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
3 // ✅ Generate a unique error ID for correlation
4 const errorId = crypto.randomUUID();
5
6 // ✅ Log the FULL error internally (for debugging)
7 logger.error('Unhandled error', {
8 errorId,
9 message: err.message,
10 stack: err.stack,
11 url: req.url,
12 method: req.method,
13 userId: req.user?.id,
14 // ✅ Do NOT log request body (may contain PII)
15 });
16
17 // ✅ Return a GENERIC error to the client
18 const statusCode = (err as any).statusCode || 500;
19 res.status(statusCode).json({
20 error: statusCode === 500
21 ? 'An internal error occurred. Please try again later.'
22 : err.message, // Only expose message for expected errors (4xx)
23 errorId, // ✅ Client can reference this for support
24 // ✅ NEVER include: stack, sql, query, connectionString, path
25 });
26});Which approach provides the strongest protection against sensitive data exposure?