Skip to content
  • There are no suggestions because the search field is empty.

Data Lake — S3 Buckets Consumer Guide

This guide explains how report data is prepared when we write it to the S3 buckets you access, and how you should read and use that data so it loads correctly (e.g. into your analytics tools).

1. How We Prepare the Data

Report files in the data lake are CSV (comma-separated values). We generate them using Amazon Redshift UNLOAD with the following options:

Setting Value Description
Delimiter Comma (,) Field separator.
Field enclosure Double quote (") Every field is wrapped in double quotes.
Escape character Backslash (\\) Special characters inside fields are escaped with \\.
Header Yes First row contains column names.
Encoding UTF-8 File encoding.

1.1 Escaping Rules (Important)

Because we use backslash as the escape character, any comma, newline, quote, or backslash that appears inside a field is escaped as follows:

Character in data How it appears in the CSV
Comma (,) \\,
Line feed \\n
Carriage return \\r
Double quote (") \\"
Backslash (\\) \\\\

So:

  • Do not treat \\n as a real newline when parsing — it is an escaped sequence representing a newline that was in the original value.
  • Do not split on every comma — commas inside quoted fields are written as \\,.
  • A double quote inside a value is written as \\", not as "" (we do not use RFC 4180 double-quote escaping).

1.2 Example Rows

Header:

"column_a","column_b","column_c"

Data row (no special characters):

"value1","value2","value3"

Data row (comma and quote inside a field):

"value1","Hello, \\"World\\"","value3"

Here, the second column’s value is: Hello, "World".

Data row (newline in data shown as escaped):

"value1","Line1\\nLine2","value3"

The second column’s value is: Line1 + newline + Line2.

1.3 Null and Empty Values

  • NULL in the source is written as an empty quoted field: "".
  • Empty string is also "".

Your loader may treat both as NULL or empty string depending on your rules.


2. How You Should Use the Data

To avoid bad records, misaligned columns, and “values spilling to the next line,” your reader must use the same format options we use when writing.

2.1 Required Parser Settings

When reading these CSV files, configure your parser as follows:

Option Use this value
Delimiter , (comma)
Quote character " (double quote)
Escape character \\ (backslash)
Header First row = column names
Encoding UTF-8

2.2 Steps to Load the Data Correctly

  1. Use a CSV parser that supports an escape character

    Ensure the parser can use backslash (\\) as the escape character, not only doubled double-quotes ("").

  2. Set delimiter, quote, and escape explicitly

    • Delimiter: comma.
    • Quote: double quote.
    • Escape: backslash.Do not rely on “auto” or RFC 4180–only behavior if that would ignore the escape character.
  3. Treat the first row as the header

    Use it for column names; do not treat it as data.

  4. Use UTF-8 when reading

    Open or decode the file as UTF-8 to avoid issues with special characters.

  5. Handle empty quoted fields

    Decide how "" should be interpreted (e.g. NULL or empty string) in your schema or bronze layer.

2.3 Steps to Open the Files in Excel

Best practice: Import the file into Excel(do NOT double-click it)

  1. Open Excel
  2. Go to Data → Get Data → From Text/CSV
  3. Select the 000 file
  4. In the import dialog: • Delimiter: Comma • Text qualifier: " (double quote)
  5. Load the data

This forces Excel to respect proper CSV quoting rules.

The configuration should be:

  • Set File Origin to (csv) UTF-8.
  • Set Delimiter to Comma.
  • Set fields to be enclosed by ' ” '
  • Specify the escape character to ‘\\’

3. What to Avoid

  • Do not assume RFC 4180 only (i.e. that the only escape inside quoted fields is ""). Our files use \\", so RFC 4180–only parsers will misparse and can produce bad records or “double quotes inside values” errors.
  • Do not assume one physical line = one record. Newlines inside fields are escaped as \\n; only the escape-aware parser can reconstruct the full field.
  • Do not split on every comma. Commas inside fields are escaped as \\,.
  • Do not ignore the escape character. Without escape = \\, parsers will treat \\"\\n, and \\, incorrectly.

4. File Layout (Reference)

Report files are written under paths such as:

  • Shared / central reports:

    s3://s3-data-layer/prod/reports/new/<report_name>_report/

    (e.g. booking_reportflights_reportservices_report, etc.)

  • Per-operator data (BDS / data stream):

    s3://fl3xx-data-lake/operator_<operatorid>/<report_name>/

  • Per-operator data lake buckets:

    s3://fl3xx-data-lake-prod-operator-<id>/<report_name>/

    (e.g. Airports, booking_report, flights_report, services_report)

File names and prefixes may include part numbers (e.g. 000001) depending on the UNLOAD configuration. Use the same CSV options for all these files.


5. Summary

We use You should use
Delimiter: comma (,) Same
Fields enclosed in double quotes (") Same
Escape character: backslash (\\) Same — required for correct parsing
First row = header Same
Encoding: UTF-8 Same

Configuring your reader with delimiter comma, quote double quote, and escape backslash will align with how we prepare the data and should prevent bad records, misaligned columns, and incorrect handling of newlines and commas inside fields.