What Is a Flat File?

The ultimate guide to flat files, their use cases, format specifics, and how they stack up against non-flat file databases and relational DBMSs

Alex Prokudin - Guest Contributor January 30, 2025 · 3 min read

Alex is a tech writer with a background in information security, identity management, and SaaS.

Flat files are an incredibly common format in data analytics. This guide explains what they are, what types of flat files exist, and when you would pick one over the other.

Jump to file:

CSV (Comma-Separated Values)
TSV (Tab-Separated Values)
PSV (Pipe-Separated Values)
Fixed-Width Files
TXT Files
JSON (JavaScript Object Notation)
XML (eXtensible Markup Language)
YAML (YAML Ain’t Markup Language)
ENV Files

The Definition Challenge

Most of the time, you deal with flat files when you’re extracting data from a database and feeding it to a processing pipeline. Chances are, you’ll be using the simplest representation of that data like a CSV file—a text file where each row represents a database record, and fields are separated with a comma. That’s where the “text database” and “flat file database” monikers come from.

However, the term “flat file” itself is used so loosely by companies that there is simply no single definition that would satisfy everybody. Here are some generally agreed-upon characteristics:

Text-only: flat files have no blobs and store text and numbers only.
Tabular format: one table per file, row-based organization of records.
Portable: it’s usually very easy to output, edit and ingest flat files, as they don’t require specialist software.
Unstructured: generally no hierarchy or relationships between records or fields.
No built-in compression: flat files don’t have smart compression algorithms by design, although tools like DuckDB and Spark will accept zip-compressed files.
No index: flat files have no built-in method to find a specific record.
No documentation: generally, they have no embedded descriptive metadata or schema information.

Even this attempt at a definition starts falling apart when considering JSON and YAML - two file formats commonly referred to as flat - which absolutely can have both hierarchy and schema.

It doesn’t really help that flat files can serve vastly different use cases, either:

CSV is typically used as a data storage and exchange format.
JSON is everywhere: exchanging large amounts of data, managing configurations, plus modern APIs mostly return JSON.
YAML is primarily a configuration management and pipeline description file format.

Let’s try to make sense of that.

Types of Flat Files

There’s more than one way to slice flat files into types. Much of that is about how data is organized and stored.

Field and record organization. Unstructured flat files (e.g. CSV) contain a single database table where each row represents a record. Structured flat files (JSON, YAML, XML) mostly have a hierarchy.
Field delimiter type. When one row represents one database record, a special character can be used to separate parts of that row into fields. Typically, it’s a comma, a tab, or a pipe (”|”).
Fixed-width vs non-fixed-width formats. With fixed-width formats, every field has a predefined width, which makes it cheaper to parse data. Writing and storing fixed-width data, however, may have some overhead as compared to some non-fixed-width formats.
Data delimiters. File formats can dictate whether you can save tabs and newlines inside a field. For example, CSV uses escape syntax to allow that, while TSV disallows tabs and newlines.
Human readability. It is usually assumed that flat files are human-readable because they only contain plain text. However, some platform vendors refer to Excel files as flat files, even though they are zipped XML files with an XML schema.
Metadata. Most flat file formats, like CSV, don’t have any metadata and need external documentation that explains what data is considered valid in each field and how it should be processed. Others, like JSON, can contain embedded metadata or schema and, therefore, are self-descriptive.

How Flat Files Are Used

Flat files don’t all have the same applications. Some work great for porting data across systems, others aren’t even designed for storing large amounts of data and serve entirely different purposes.

Storage And Exchange

Data exchange. Flat files like CSV and JSON are commonly used to exchange data between applications or platforms. Many platforms already support them out of the box, and you can write your own data source connector for a platform that doesn’t support them.
Data integration in ETL. Flat files often act as a source or destination for data in ETL pipelines, so you can transform them and load them into data warehouses.
Archiving and backup. Flat files are a common choice to store historical or archived data. Because they are text-only, the data remains accessible without dependency on proprietary software.

Utility Use

Configuration management. Flat files like YAML, JSON, and INI are commonly used to define configuration settings for applications and systems: environment variables, database connections, feature toggles, and more.
Data pipeline definitions. Flat files like JSON and YAML are often used to define the structure and flow of data pipelines to automate data processing tasks.
Metadata for datasets. JSON is sometimes used to describe how to validate and transform CSV files.

Examples of Flat Files

Let’s look at the most common flat file formats: what they look like, where they excel, and where they are not the best choice.

CSV (Comma-Separated Values)

File extension: .csv
Delimiter: Comma (,)
Field Organization: Flat
Human Readable: Yes

CSV files are typically used in data exchange, ETL pipelines, and lightweight tabular data analysis. It’s also a standard export option for data tools such as Excel, PowerBI, Looker. CSVs are often compressed with gzip or zip, in which case the extension may be .csv.gz or .csv.zip.

Example:

name, country, age
Alice, USA, 22
Bob, Canada, 34
Charlie, UK, 28

Choose if:

You need a widely supported format for structured tabular data.
Your data is coming from BI systems or Excel/Google Sheets.

Avoid if:

You need to represent complex hierarchies.
Your data contains many commas that require parsing escape sequences.

Tip: You can easily add CSVs as a data source in Evidence. Learn how

TSV (Tab-Separated Values)

File extension: .tsv
Delimiter: Tab (\t)
Field Organization: Flat
Human Readable: Yes

TSV is typically used in scientific computing (e.g. in The Cancer Genome Atlas project) and workflows where commas might conflict with data values (e.g. address fields).

Example:

name	country	age
Alice	USA	22
Bob	Canada	34
Charlie	UK	28

Choose if:

Your data contains commas and you need an alternative delimiter that minimizes conflicts.
You need to use simple Unix CLI tools to parse the data.

Avoid if:

Your data contains tabs that require escaping.
You need to store hierarchical data.

PSV (Pipe-Separated Values)

File extension: .psv
Delimiter: Pipe (|)
Field Organization: Flat
Human Readable: Yes

PSV is typically used in financial systems (e.g. Bank of America), industry-specific workflows, and proprietary data exchange where unique delimiters are required.

Example:

name|country|age
Alice|USA|22
Bob|Canada|34
Charlie|UK|28

Choose if:

You need a delimiter that is less likely to appear in data fields.

Avoid if:

Your tools lack native support for parsing pipe-separated files.
You need to represent nested data.

Fixed-Width Files

File extension: Varies
Delimiter: None (fixed-width fields)
Field Organization: Flat
Human Readable: Limited

Fixed-width files are typically used in legacy systems, financial reporting, and mainframe data processing.

Example:

Name       Country    Age
Alice      USA        22
Bob        Canada     34
Charlie    UK         28

Choose if:

You need speed and low resource consumption when reading files.
All your systems that read these files know how long each “column” is.

Avoid if:

You need flexible field length.
The overhead of writing fixed-width files may outweigh the benefits of high performance when reading them.

TXT Files

File extension: .txt
Delimiter: None or arbitrary
Field Organization: Flat or hierarchical
Human Readable: Yes

TXT files are typically used for simple text storage, writing log files, and unstructured data exchange.

Example:

2025-01-24 12:00:00 INFO  Data pipeline started: Extracting data from source 'sales_db'.
2025-01-24 12:01:15 WARN  Missing values detected in column 'Region'. Proceeding with default value 'Unknown'.
2025-01-24 12:02:45 ERROR Failed to connect to API 'weather_service'. Retrying in 30 seconds.
2025-01-24 12:03:15 INFO  Data transformation completed: 10,000 rows processed.

Choose if:

You need a lightweight, generic format for storing plain text data.

Avoid if:

Your data needs a hierarchy.

JSON (JavaScript Object Notation)

File extension: .json
Delimiter: None (uses key-value pairs and nested objects)
Field Organization: Hierarchical
Human Readable: Yes

JSON is typically used in web APIs, data exchange, and nested or semi-structured data processing.

Example:

[
	{
		"name": "Alice",
		"country": "USA",
		"age": 22
	},
	{
		"name": "Bob",
		"country": "Canada",
		"age": 34
	},
	{
		"name": "Charlie",
		"country": "UK",
		"age": 28
	}
]

Choose if:

Your data includes hierarchies or nested relationships.
Your data needs to be machine-readable as well as human-readable.

Avoid if:

Data processing speed is a priority.
You have data types not supported by JSON.

XML (eXtensible Markup Language)

File extension: .xml
Delimiter: None (uses tags)
Field Organization: Hierarchical
Human Readable: Yes (but verbose)

This format was used in enterprise data exchange, web services (e.g., SOAP), and document storage with metadata, but it has largely been superseded by JSON for most use cases.

Example:

<people>
  <person>
    <name>Alice</name>
    <country>USA</country>
    <age>22</age>
  </person>
  <person>
    <name>Bob</name>
    <country>Canada</country>
    <age>34</age>
  </person>
  <person>
    <name>Charlie</name>
    <country>UK</country>
    <age>28</age>
  </person>
</people>

Choose if:

You need a highly structured format with metadata and validation capabilities.
You need compatibility with older systems that require XML.

Avoid if:

You prioritize file size efficiency or readability because XML is verbose.

YAML (YAML Ain’t Markup Language)

File extension: .yaml or .yml
Delimiter: None (uses indentation for structure)
Field Organization: Hierarchical
Human Readable: Yes

YAML is typically used in configuration files, GitHub workflows, and data serialization.

Example:

restart-strategy:
  type: failure-rate
  failure-rate:
    delay: 1 s
    failure-rate-interval: 1 min
    max-failures-per-interval: 1

Choose if:

You need a human-readable format for configuring pipeline jobs and orchestrating workflows.

Avoid if:

You need to store a large dataset.

ENV Files

File extension: .env
Delimiter: None (uses = for key-value pairs)
Field Organization: Flat
Human Readable: Yes

ENV files are typically used in storing environment variables and application configurations.

Example:

APP_NAME=MyAnalyticsApp
ENVIRONMENT=production
DATABASE_URL=postgres://user:password@localhost:5432/mydb
API_KEY=abcdef123456
DEBUG=False

Choose if:

You need a lightweight and straightforward way to manage configurations for deployment or local environments

Avoid if:

You need advanced data structures or are storing large amounts of data.

Flat Files vs Non-Flat Files vs DBMS

Flat files are commonly compared against relational databases, although there is a middle-ground: file formats like Avro, Parquet, and ORC. Let’s compare the most popular options by the features we discussed in the beginning.

Feature	CSV	JSON	Parquet	Relational DBMS
Records organization	Row-based	Key-value pairs	Columnar	Row-based
Human readability	Text	Text	Binary	Binary
Portability	High	High	High	Low
Hierarchy	None	Available	Nested structures	Multiple tables
Scalability	Low	Low	High	High
Index	No	No	Yes*	Yes
Schema	None	Possible	Enforced	Enforced

* Parquet doesn’t have B-trees or hash indexes. Instead, it has file-level, row group-level, and column-level metadata that helps decide which data to read and which to skip instead of scanning the entire dataset.

How to Select The Right Flat File Format

With this many options, you can use a simple rule of thumb:

Use flat files with a delimiter, such as CSV and TSV, if you need a simple file format to archive data or move it between platforms or spreadsheet software like Excel and Google Sheets.
Use JSON if you want a self-descriptive file format with a hierarchy, for either data storage/exchange or configurations.
Use other flat file formats when you need to integrate with tools that need them.
Consider using file formats like Parquet if you need smaller file sizes, faster queries, and support for complex data types.

Start Analyzing Flat Files Data with Evidence

Evidence allows you to analyse and display data from flat files. It accepts JSON, CSV, and Parquet files. It also supports most popular analytics databases as data sources. To get started with Evidence and create your first data product, install the VSCode extension.