Flat files

What Is a Flat File?

The ultimate guide to flat files, their use cases, format specifics, and how they stack up against non-flat file databases and relational DBMSs

Alex is a tech writer with a background in information security, identity management, and SaaS.

Flat files are an incredibly common format in data analytics. This guide explains what they are, what types of flat files exist, and when you would pick one over the other.

Jump to file:

The Definition Challenge

Most of the time, you deal with flat files when you’re extracting data from a database and feeding it to a processing pipeline. Chances are, you’ll be using the simplest representation of that data like a CSV file—a text file where each row represents a database record, and fields are separated with a comma. That’s where the “text database” and “flat file database” monikers come from.

However, the term “flat file” itself is used so loosely by companies that there is simply no single definition that would satisfy everybody. Here are some generally agreed-upon characteristics:

Even this attempt at a definition starts falling apart when considering JSON and YAML - two file formats commonly referred to as flat - which absolutely can have both hierarchy and schema.

It doesn’t really help that flat files can serve vastly different use cases, either:

Let’s try to make sense of that.

Types of Flat Files

There’s more than one way to slice flat files into types. Much of that is about how data is organized and stored.

How Flat Files Are Used

Flat files don’t all have the same applications. Some work great for porting data across systems, others aren’t even designed for storing large amounts of data and serve entirely different purposes.

Storage And Exchange

Utility Use

Examples of Flat Files

Let’s look at the most common flat file formats: what they look like, where they excel, and where they are not the best choice.

CSV (Comma-Separated Values)

CSV files are typically used in data exchange, ETL pipelines, and lightweight tabular data analysis. It’s also a standard export option for data tools such as Excel, PowerBI, Looker. CSVs are often compressed with gzip or zip, in which case the extension may be .csv.gz or .csv.zip.

Example:

name, country, age
Alice, USA, 22
Bob, Canada, 34
Charlie, UK, 28

Choose if:

Avoid if:

Tip: You can easily add CSVs as a data source in Evidence. Learn how

TSV (Tab-Separated Values)

TSV is typically used in scientific computing (e.g. in The Cancer Genome Atlas project) and workflows where commas might conflict with data values (e.g. address fields).

Example:

name	country	age
Alice	USA	22
Bob	Canada	34
Charlie	UK	28

Choose if:

Avoid if:

PSV (Pipe-Separated Values)

PSV is typically used in financial systems (e.g. Bank of America), industry-specific workflows, and proprietary data exchange where unique delimiters are required.

Example:

name|country|age
Alice|USA|22
Bob|Canada|34
Charlie|UK|28

Choose if:

Avoid if:

Fixed-Width Files

Fixed-width files are typically used in legacy systems, financial reporting, and mainframe data processing.

Example:

Name       Country    Age
Alice      USA        22
Bob        Canada     34
Charlie    UK         28

Choose if:

Avoid if:

TXT Files

TXT files are typically used for simple text storage, writing log files, and unstructured data exchange.

Example:

2025-01-24 12:00:00 INFO  Data pipeline started: Extracting data from source 'sales_db'.
2025-01-24 12:01:15 WARN  Missing values detected in column 'Region'. Proceeding with default value 'Unknown'.
2025-01-24 12:02:45 ERROR Failed to connect to API 'weather_service'. Retrying in 30 seconds.
2025-01-24 12:03:15 INFO  Data transformation completed: 10,000 rows processed.

Choose if:

Avoid if:

JSON (JavaScript Object Notation)

JSON is typically used in web APIs, data exchange, and nested or semi-structured data processing.

Example:

[
	{
		"name": "Alice",
		"country": "USA",
		"age": 22
	},
	{
		"name": "Bob",
		"country": "Canada",
		"age": 34
	},
	{
		"name": "Charlie",
		"country": "UK",
		"age": 28
	}
]

Choose if:

Avoid if:

XML (eXtensible Markup Language)

This format was used in enterprise data exchange, web services (e.g., SOAP), and document storage with metadata, but it has largely been superseded by JSON for most use cases.

Example:

<people>
  <person>
    <name>Alice</name>
    <country>USA</country>
    <age>22</age>
  </person>
  <person>
    <name>Bob</name>
    <country>Canada</country>
    <age>34</age>
  </person>
  <person>
    <name>Charlie</name>
    <country>UK</country>
    <age>28</age>
  </person>
</people>

Choose if:

Avoid if:

YAML (YAML Ain’t Markup Language)

YAML is typically used in configuration files, GitHub workflows, and data serialization.

Example:

restart-strategy:
  type: failure-rate
  failure-rate:
    delay: 1 s
    failure-rate-interval: 1 min
    max-failures-per-interval: 1

Choose if:

Avoid if:

ENV Files

ENV files are typically used in storing environment variables and application configurations.

Example:

APP_NAME=MyAnalyticsApp
ENVIRONMENT=production
DATABASE_URL=postgres://user:password@localhost:5432/mydb
API_KEY=abcdef123456
DEBUG=False

Choose if:

Avoid if:

Flat Files vs Non-Flat Files vs DBMS

Flat files are commonly compared against relational databases, although there is a middle-ground: file formats like Avro, Parquet, and ORC. Let’s compare the most popular options by the features we discussed in the beginning.

FeatureCSVJSONParquetRelational DBMS
Records organizationRow-basedKey-value pairsColumnarRow-based
Human readabilityTextTextBinaryBinary
PortabilityHighHighHighLow
HierarchyNoneAvailableNested structuresMultiple tables
ScalabilityLowLowHighHigh
IndexNoNoYes*Yes
SchemaNonePossibleEnforcedEnforced

* Parquet doesn’t have B-trees or hash indexes. Instead, it has file-level, row group-level, and column-level metadata that helps decide which data to read and which to skip instead of scanning the entire dataset.

How to Select The Right Flat File Format

With this many options, you can use a simple rule of thumb:

Start Analyzing Flat Files Data with Evidence

Evidence allows you to analyse and display data from flat files. It accepts JSON, CSV, and Parquet files. It also supports most popular analytics databases as data sources. To get started with Evidence and create your first data product, install the VSCode extension.