FL3XX Data Lake — S3 Buckets Consumer Guide

1. How We Prepare the Data

Report files in the data lake are CSV (comma-separated values). We generate them using Amazon Redshift UNLOAD with the following options:

Setting	Value	Description
Delimiter	Comma (`,`)	Field separator.
Field enclosure	Double quote (`"`)	Every field is wrapped in double quotes.
Escape character	Backslash (`\\`)	Special characters inside fields are escaped with `\\`.
Header	Yes	First row contains column names.
Encoding	UTF-8	File encoding.

1.1 Escaping Rules (Important)

Because we use backslash as the escape character, any comma, newline, quote, or backslash that appears inside a field is escaped as follows:

Character in data	How it appears in the CSV
Comma (`,`)	`\\,`
Line feed	`\\n`
Carriage return	`\\r`
Double quote (`"`)	`\\"`
Backslash (`\\`)	`\\\\`

So:

Do not treat \\n as a real newline when parsing — it is an escaped sequence representing a newline that was in the original value.
Do not split on every comma — commas inside quoted fields are written as \\,.
A double quote inside a value is written as \\", not as "" (we do not use RFC 4180 double-quote escaping).

If you are trying to load the file in a pandas dataframe using Python, then you should always add escapechar='\\' in the pd.read() command:

pd.read_csv(csv_file, escapechar='\\')

1.2 Example Rows

Header:

"column_a","column_b","column_c"

Data row (no special characters):

"value1","value2","value3"

Data row (comma and quote inside a field):

"value1","Hello, \\"World\\"","value3"

Here, the second column’s value is: Hello, "World".

Data row (newline in data shown as escaped):

"value1","Line1\\nLine2","value3"

The second column’s value is: Line1 + newline + Line2.

1.3 Null and Empty Values

NULL in the source is written as an empty quoted field: "".
Empty string is also "".

Your loader may treat both as NULL or empty string depending on your rules.

2. How You Should Use the Data

To avoid bad records, misaligned columns, and “values spilling to the next line,” your reader must use the same format options we use when writing.

2.1 Required Parser Settings

When reading these CSV files, configure your parser as follows:

Option	Use this value
Delimiter	`,` (comma)
Quote character	`"` (double quote)
Escape character	`\\` (backslash)
Header	First row = column names
Encoding	UTF-8

2.2 Steps to Load the Data Correctly

Use a CSV parser that supports an escape character

Ensure the parser can use backslash (\\) as the escape character, not only doubled double-quotes ("").
Set delimiter, quote, and escape explicitly
- Delimiter: comma.
- Quote: double quote.
- Escape: backslash.Do not rely on “auto” or RFC 4180–only behavior if that would ignore the escape character.
Treat the first row as the header

Use it for column names; do not treat it as data.
Use UTF-8 when reading

Open or decode the file as UTF-8 to avoid issues with special characters.
Handle empty quoted fields

Decide how "" should be interpreted (e.g. NULL or empty string) in your schema or bronze layer.

2.3 Steps to Open the Files in Excel

Best practice: Import the file into Excel(do NOT double-click it)

Open Excel
Go to Data → Get Data → From Text/CSV
Select the 000 file
In the import dialog: • Delimiter: Comma • Text qualifier: " (double quote)
Load the data

This forces Excel to respect proper CSV quoting rules.

The configuration should be:

Set File Origin to (csv) UTF-8.
Set Delimiter to Comma.
Set fields to be enclosed by ' ” '
Specify the escape character to ‘\\’

3. What to Avoid

Do not assume RFC 4180 only (i.e. that the only escape inside quoted fields is ""). Our files use \\", so RFC 4180–only parsers will misparse and can produce bad records or “double quotes inside values” errors.
Do not assume one physical line = one record. Newlines inside fields are escaped as \\n; only the escape-aware parser can reconstruct the full field.
Do not split on every comma. Commas inside fields are escaped as \\,.
Do not ignore the escape character. Without escape = \\, parsers will treat \\", \\n, and \\, incorrectly.

4. File Layout (Reference)

Report files are written under paths such as:

Shared / central reports:

s3://s3-data-layer/prod/reports/new/<report_name>_report/

(e.g. booking_report, flights_report, services_report, etc.)
Per-operator data (BDS / data stream):

s3://fl3xx-data-lake/operator_<operatorid>/<report_name>/
Per-operator data lake buckets:

s3://fl3xx-data-lake-prod-operator-<id>/<report_name>/

(e.g. Airports, booking_report, flights_report, services_report)

File names and prefixes may include part numbers (e.g. 000, 001) depending on the UNLOAD configuration. Use the same CSV options for all these files.

5. Schema dictionary

You can find an overview of all available fields and datasets (reports) in the data lake here. The schema dictionary provides:

- The report/S3 path/dataset column name, so you can trace each field.

- A description of every field, including its meaning and typical usage.

- A complete inventory of all data lake fields, allowing you to align your own models, queries, and BI tools with the data we deliver.

6. Summary

We use	You should use
Delimiter: comma (`,`)	Same
Fields enclosed in double quotes (`"`)	Same
Escape character: backslash (`\\`)	Same — required for correct parsing
First row = header	Same
Encoding: UTF-8	Same

Configuring your reader with delimiter comma, quote double quote, and escape backslash will align with how we prepare the data and should prevent bad records, misaligned columns, and incorrect handling of newlines and commas inside fields.