JSON vs YAML vs CSV: Choosing the Right Data Format

6 min read

JSON, YAML, and CSV solve overlapping problems in incompatible ways. JSON dominates web APIs, YAML rules configuration files, and CSV remains the lingua franca of tabular data and spreadsheets. Picking the wrong one for a given job leads to brittle parsing, silent data corruption, or files no human can edit.

This guide walks through what each format is, where it shines, where it bites, and how to choose. It also covers what gets lost when you convert between them, because in practice you will convert between them constantly.

Where Each Format Came From

JSON (JavaScript Object Notation) was specified by Douglas Crockford in the early 2000s as a subset of JavaScript object literal syntax. It describes a small set of types: objects, arrays, strings, numbers, booleans, and null. Its grammar fits on a business card, which is precisely why it won as the default API serialization format.

YAML (YAML Ain't Markup Language) arrived around the same time with a different goal: a serialization format optimized for humans to read and write by hand. It is technically a superset of JSON, so any valid JSON is valid YAML, but its native syntax leans on indentation and a rich set of conveniences like comments and anchors.

CSV (comma-separated values) predates both by decades. It had no single authoritative standard until RFC 4180 attempted to codify common practice in 2005, and even that is widely ignored. CSV is not a data structure so much as a convention: rows of fields separated by a delimiter, optionally quoted.

JSON: The Default for APIs and Interchange

JSON's strengths are ubiquity and predictability. Every mainstream language ships a parser, the type model is unambiguous, and parsing is fast and streamable. A string is always a string and a number is always a number, so round-tripping data between services rarely surprises you. For request and response bodies, message queues, log lines, and config consumed by machines rather than edited by humans, JSON is the safe default.

Its weaknesses show up the moment a human has to maintain a JSON file. There are no comments, so you cannot annotate why a setting exists. Strict syntax means a single trailing comma breaks the whole document. It is also verbose: every key is quoted and structure is carried by braces and brackets rather than layout, which makes large nested config tedious to hand-edit. JSON also has no native date type, so timestamps live as strings or numbers by convention, and large integers can exceed what a double-precision float represents safely.

YAML: Readable Config, With Sharp Edges

YAML trades strictness for readability. Structure comes from indentation rather than punctuation, you can write comments with a hash, and you can reuse blocks with anchors and aliases (define a value once with an ampersand, reference it later with an asterisk). This is why Kubernetes manifests, GitHub Actions and GitLab CI pipelines, Ansible playbooks, and Docker Compose files all use YAML. For configuration that humans read and edit daily, it is hard to beat.

The cost is fragility. Because indentation is significant, a stray space or a tab where spaces are expected can change meaning or break parsing entirely, and the error often points at the wrong line. Worse is YAML's implicit type coercion, the infamous 'Norway problem': the country code 'NO' is interpreted as the boolean false, so a list of country codes silently turns Norway into a falsy value. The same trap catches version strings like 1.10 (parsed as the number 1.1, dropping the trailing zero), values like 'on', 'off', and 'yes', and times like 22:22 read as a base-60 number. The defense is to quote any scalar whose literal text matters. YAML's full specification is also large and parsers differ in support, so advanced features do not always travel between tools.

CSV: Flat, Fast, and Everywhere in Spreadsheets

CSV is the right tool for tabular data: a fixed set of columns repeated over many rows. It opens directly in Excel, Google Sheets, and every database import wizard, and it streams beautifully. Because each row is independent, you can process a multi-gigabyte file line by line without loading it into memory, which is something neither JSON nor YAML handles gracefully for large collections. For data exports, analytics dumps, and bulk loads, CSV is usually the fastest path.

Its limitations are structural. CSV has no concept of nesting, so anything hierarchical has to be flattened into columns or stuffed into a cell as encoded text. It carries no type information; every field is just text, and whether 007 is a string or the number 7 is left to the reader. Escaping is genuinely ambiguous in practice: fields containing the delimiter, quotes, or newlines must be quoted and escaped, but tools disagree on the rules, and regional differences mean some locales use semicolons because the comma is a decimal separator. There is also no required header row, so the meaning of each column is a convention you have to trust.

A Practical Decision Guide

Reach for JSON when machines are the primary readers: REST and GraphQL APIs, browser-to-server payloads, message buses, and structured logs. Its unambiguous types and universal parser support make it the lowest-friction choice for data in motion between systems.

Reach for YAML when humans edit the file regularly and structure is nested: application config, infrastructure-as-code, and CI/CD pipelines. The comments and reduced punctuation pay off every time someone opens the file, as long as your team knows to quote ambiguous scalars.

Reach for CSV when the data is genuinely tabular and headed for a spreadsheet, a SQL bulk import, or a streaming pipeline. If your data has no nesting and you care about row-by-row throughput or non-developer consumers, CSV is the pragmatic answer. When you find yourself encoding nested objects into CSV cells, that is the signal you have outgrown it and should move to JSON.

Converting Between Formats and What You Lose

Conversions are common, but they are rarely lossless because the formats do not share a type and structure model. Going from YAML to JSON discards every comment and collapses anchors into their expanded values, so the human-friendly annotations that justified YAML in the first place vanish. Going the other way, JSON to YAML, produces a valid file but reintroduces the coercion risk on any unquoted scalar.

Flattening to CSV is the lossiest step. Nested JSON or YAML objects must be either flattened into dotted column names like address.city or serialized back into a string inside one cell, and arrays of varying length do not map cleanly to fixed columns at all. The reverse, CSV to JSON, is straightforward for flat data but requires you to decide how to re-infer types, since CSV's everything-is-text model means you choose whether 'false' becomes a boolean or stays a string.

When you do need to convert, DevFmt's JSON to YAML, JSON to CSV, CSV to JSON, and YAML to JSON converters run entirely in your browser, so the data never leaves your machine. That matters when you are pasting in config or exports that may contain secrets or customer records. Whatever tool you use, inspect the output rather than trusting the round trip: the loss of comments, the flattening of nesting, and silent type coercion are where conversions quietly go wrong.

We use cookies for anonymous analytics and ads. Your tool data never leaves your browser.