DailyTools
All articles
Data EngineeringApril 15, 20269 min read

CSV, JSON, and XML: Choosing the Right Data Format and Converting Between Them

Three data formats dominate the software ecosystem. Learn when to use each, how they handle edge cases differently, and the practical techniques for converting between them without data loss.

CSV, JSON, and XML are the three data interchange formats that every developer encounters regularly. APIs return JSON. Spreadsheets export CSV. Enterprise integrations speak XML. Legacy systems, cloud services, databases, and analytics platforms each have their preferred format — and the mismatch between what one system produces and what another consumes is a constant source of friction in data pipelines.

Choosing the right format for a given use case, and converting between formats without data loss, requires understanding what each format can and cannot express. The differences are more subtle than they appear: CSV cannot represent nested data, JSON has no native concept of attributes versus elements, and XML's verbose syntax carries significant overhead. Each trade-off matters.

CSV: The Universal Tabular Format

CSV (Comma-Separated Values) is the simplest data format: rows of values separated by commas (or other delimiters), with an optional header row. Its simplicity is both its greatest strength and its greatest limitation. CSV is universally supported — every spreadsheet application, database, and programming language can read and write CSV. It is human-readable in a text editor and compact (no structural overhead like tags or braces).

CSV's limitations become apparent quickly: it has no native type system (everything is a string), no standard way to represent nested or hierarchical data, no metadata, and the specification (RFC 4180) leaves many edge cases ambiguous. Different implementations handle quoting, escaping, and encoding differently, which is why CSV parsing bugs are notoriously common.

  • Best for: Flat tabular data with consistent columns — database exports, spreadsheet data, simple logs
  • Not suitable for: Nested structures, mixed-type data, configuration files, or any data with parent-child relationships
  • Edge cases: Fields containing commas must be quoted. Fields containing quotes must escape them by doubling (""). Newlines within fields require quoting. Empty fields produce empty strings, not null.

JSON: The Web's Native Data Format

JSON (JavaScript Object Notation) has become the dominant data interchange format for web APIs, configuration files, and NoSQL databases. It supports six data types (string, number, boolean, null, array, object), arbitrary nesting depth, and is natively parseable in every modern programming language. JSON's syntax maps directly to the data structures developers work with — objects/dictionaries and arrays/lists.

JSON's main limitations are the lack of comments (making it awkward for human-edited config files, which is why JSONC and JSON5 exist), no date type (dates are typically ISO 8601 strings), no binary data support (binary must be Base64-encoded as strings), and no schema enforcement (any key can hold any type, which JSON Schema addresses).

json
{
  "employees": [
    {
      "id": 1,
      "name": "Alice Chen",
      "department": "Engineering",
      "skills": ["Go", "Kubernetes", "PostgreSQL"],
      "active": true
    },
    {
      "id": 2,
      "name": "Bob Park",
      "department": "Design",
      "skills": ["Figma", "CSS", "Motion"],
      "active": false
    }
  ],
  "total": 2,
  "page": 1
}

XML: The Enterprise Workhorse

XML (Extensible Markup Language) predates JSON and was the dominant data interchange format from the late 1990s through the mid-2010s. It remains deeply embedded in enterprise software (SOAP APIs, SAML authentication, Spring configuration), document formats (DOCX, SVG, RSS, XHTML), and regulated industries (healthcare HL7, finance FIX/FpML) where formal schemas and validation are required.

XML's key advantage over JSON is its mature ecosystem: XML Schema (XSD) provides rigorous type and structure validation, XSLT enables powerful document transformation, XPath provides a query language for navigating documents, and namespaces prevent name collisions when combining data from multiple sources. These features matter in enterprise contexts where contracts between systems must be formally specified and validated.

XML's key disadvantage is verbosity. Every piece of data requires an opening and closing tag, making XML documents 2-3x larger than equivalent JSON for the same data. The distinction between attributes and child elements introduces ambiguity with no clear convention, and parsing XML is significantly more complex and slower than parsing JSON.

Converting Between Formats

JSON to CSV

JSON-to-CSV conversion is straightforward when the JSON is a flat array of objects with consistent keys. Each object becomes a row; each unique key becomes a column header. The challenge arises with nested data: a JSON field containing an object or array has no natural CSV representation. Common approaches include serializing nested values as JSON strings within the CSV cell, flattening nested objects using dot notation (user.address.city becomes a column named 'user.address.city'), or ignoring nested fields entirely.

CSV to JSON

CSV-to-JSON conversion requires deciding how to handle data types. Raw CSV has no type information — the string '42' could be a number, a zip code (string), or a boolean-like flag. Intelligent converters apply type coercion: strings that look like numbers become JSON numbers, 'true'/'false' become booleans, and empty cells become null. This heuristic is usually correct but can misfire — zip codes starting with 0 (like '07052') lose their leading zero when converted to numbers.

XML to JSON

XML-to-JSON conversion is inherently lossy because XML has concepts that JSON does not: attributes versus child elements, processing instructions, comments, and mixed content (text interleaved with elements). The standard convention is to place attributes under a special '@attributes' key and group repeated sibling elements into arrays. Text content of elements with attributes goes under a '#text' key.

text
<!-- XML -->
<book id="101" lang="en">
  <title>Clean Code</title>
  <authors>
    <author>Robert C. Martin</author>
  </authors>
</book>

// JSON equivalent
{
  "book": {
    "@attributes": { "id": "101", "lang": "en" },
    "title": "Clean Code",
    "authors": {
      "author": "Robert C. Martin"
    }
  }
}

When to Use Each Format

  • Use CSV when: exchanging flat tabular data with spreadsheet users, importing/exporting database tables, processing large datasets where compactness matters, or when the consumer expects rows and columns
  • Use JSON when: building web APIs, storing configuration files, working with JavaScript/TypeScript applications, exchanging nested or hierarchical data, or communicating with NoSQL databases
  • Use XML when: integrating with enterprise SOAP services, working with SVG or RSS feeds, operating in regulated industries that mandate XML schemas, or when formal validation with XSD is required
  • Use Protocol Buffers or MessagePack when: performance and bandwidth are critical (binary formats are 3-10x smaller and faster to parse than JSON/XML)

Common Pitfalls

  • CSV encoding: Always specify UTF-8 encoding explicitly. Excel defaults to the system locale encoding (often Windows-1252), causing garbled characters for international text.
  • JSON number precision: JavaScript cannot safely represent integers larger than 2^53 - 1 (Number.MAX_SAFE_INTEGER). Database IDs from 64-bit systems (like Twitter/X snowflake IDs) must be transmitted as strings to avoid silent truncation.
  • XML namespace conflicts: When combining XML from multiple sources, unqualified element names may collide. Always use namespace-aware parsers and declare namespaces explicitly.
  • CSV delimiter ambiguity: Some locales use semicolons instead of commas as the CSV delimiter (because commas are used as decimal separators). TSV (tab-separated) avoids this ambiguity entirely.

Try the free tool referenced in this article

CSV to JSON Converter