Преобразование данных не учитывается в AWS Glue

У меня есть bucket S3 с папками, в которых лежат файлы. Я хочу сделать базу данных, чтобы иметь возможность запрашивать эти документы по нескольким ключам с помощью API на основе Lambda. Но для этого мне нужно нормализовать данные. Например, мне нужно преобразовать все файлы в папке /jomalone/ следующим образом:

{
  "data": {
    "products": {
      "items": [
        {
          "default_category": {
            "id": "25956",
            "value": "Bath & Body"
          },
          "description": "London's Covent Garden early morning market.  Succulent nectarine, peach and cassis and delicate spring flowers melt into the note of acacia honey.  Sweet and delightfully playful.  Our luxuriously rich Body Crème with its conditioning oils of jojoba seed, cocoa seed and sweet almond, help to hydrate, nourish and protect the skin, while delicious signature fragrances leave your body scented all over.",
          "display_name": "Nectarine Blossom & Honey Body Crème",
          "is_hazmat": false,
          "meta": {
            "description": "The Jo Malone™ Nectarine Blossom & Honey Body Crème leaves skin beautifully scented with fruity notes of nectarine and peach sweetened with acacia honey."
          },
          "product_badge": null,
          "product_id": "10024",
          "product_url": "/product/25956/10024/bath-body/nectarine-blossom-honey-body-creme",
          "short_description": "A caring Body Crème infused with the succulent scent of Nectarine Blossom & Honey.",
          "tags": {
            "total": 2,
            "items": [
              {
                "id": "25956",
                "value": "Bath & Body",
                "key": "bath-body"
              },
              {
                "id": "26087",
                "value": "Nectarine Blossom & Honey Scent",
                "key": "nectarine-blossom-honey-scent"
              }
            ]
          },
          "cross_sell": [
            {
              "sku_id": "L0Y401",
              "sort_key": 1
            },
            {
              "sku_id": "L0YF01",
              "sort_key": 2
            },
            {
              "sku_id": "L01G01",
              "sort_key": 3
            },
            {
              "sku_id": "L8CC01",
              "sort_key": 4
            },
            {
              "sku_id": "L8CA01",
              "sort_key": 5
            },
            {
              "sku_id": "L7XW01",
              "sort_key": 6
            }
          ],
          "maincat": [
            {
              "key": "bathbody-maincat",
              "value": "bathbody_maincat"
            }
          ],
          "subcat": [
            {
              "key": "bodycare-subcat",
              "value": "bodycare_subcat"
            }
          ],
          "media": null,
          "reviews": {
            "average_rating": null,
            "number_of_reviews": null
          },
          "usage": [
            {
              "content": "Take a generous amount of our luxurious Body Crème and massage into skin.",
              "label": "HOW DOES IT WORK",
              "type": "how_does_it_work"
            }
          ],
          "fragrance_family": [
            {
              "key": "fruity-fragrance",
              "value": "fruity_fragrance"
            }
          ],
          "style": [
            {
              "key": "decadent-style",
              "value": "decadent_style"
            }
          ],
          "mood": [
            {
              "key": "cosy-mood",
              "value": "cosy_mood"
            }
          ],
          "skus": {
            "total": 1,
            "items": [
              {
                "is_default_sku": false,
                "is_discountable": true,
                "is_giftwrap": false,
                "is_under_weight_hazmat": false,
                "iln_listing": "Ingredients: Water\\Aqua\\Eau, Glycerin, Cetearyl Alcohol, Simmondsia Chinensis (Jojoba) Seed Oil, Fragrance (Parfum), Glyceryl Stearate, Stearic Acid, Triethanolamine, Theobroma Cacao (Cocoa) Seed Butter, Prunus Amygdalus Dulcis (Sweet Almond) Oil, Isopropyl Palmitate, Dimethicone, Aloe Barbadensis Leaf Juice, Bisabolol, Caffeine, Cocamidopropyl Pg-Dimonium Chloride Phosphate, Glyceryl Laurate, Hexylene Glycol, Caprylyl Glycol, Disodium Edta, Citral, Limonene, Citronellol, Linalool, Phenoxyethanol <ILN47239>",
                "iln_version_number": "ILN47239",
                "inventory_status": "Active",
                "material_code": "L4P8010000",
                "prices": [
                  {
                    "currency": "EUR",
                    "is_discounted": false,
                    "include_tax": {
                      "price": 68,
                      "original_price": 68,
                      "price_per_unit": 38.86,
                      "price_formatted": "€68.00",
                      "original_price_formatted": "€68.00",
                      "price_per_unit_formatted": "€38.86 / 100ML"
                    }
                  }
                ],
                "sizes": [
                  {
                    "value": "175ML",
                    "key": 1
                  }
                ],
                "shades": [
                  {
                    "name": "",
                    "description": "",
                    "hex_val": ""
                  }
                ],
                "sku_id": "L4P801",
                "sku_badge": null,
                "unit_size_formatted": "100ML",
                "upc": "690251040254",
                "is_engravable": null,
                "perlgem": {
                  "SKU_BASE_ID": 63584
                },
                "media": {
                  "large": [
                    {
                      "src": "/media/export/cms/products/1000x1000/jo_sku_L4P801_1000x1000_0.png",
                      "alt": "Nectarine Blossom & Honey Body Crème",
                      "height": 1000,
                      "width": 1000
                    },
                    {
                      "src": "/media/export/cms/products/1000x1000/jo_sku_L4P801_1000x1000_1.png",
                      "alt": "Nectarine Blossom & Honey Body Crème",
                      "height": 1000,
                      "width": 1000
                    }
                  ],
                  "medium": [
                    {
                      "src": "/media/export/cms/products/670x670/jo_sku_L4P801_670x670_0.png",
                      "alt": "Nectarine Blossom & Honey Body Crème",
                      "height": 670,
                      "width": 670
                    }
                  ],
                  "small": [
                    {
                      "src": "/media/export/cms/products/100x100/jo_sku_L4P801_100x100_0.png",
                      "alt": "Nectarine Blossom & Honey Body Crème",
                      "height": 100,
                      "width": 100
                    }
                  ]
                },
                "collection": null,
                "recipient": [
                  {
                    "key": "mom-recipient",
                    "value": "mom_recipient"
                  },
                  {
                    "key": "bride-recipient",
                    "value": "bride_recipient"
                  },
                  {
                    "key": "host-recipient",
                    "value": "host_recipient"
                  },
                  {
                    "key": "me-recipient",
                    "value": "me_recipient"
                  },
                  {
                    "key": "her-recipient",
                    "value": "her_recipient"
                  }
                ],
                "occasion": [
                  {
                    "key": "thankyou-occasion",
                    "value": "thankyou_occasion"
                  },
                  {
                    "key": "birthday-occasion",
                    "value": "birthday_occasion"
                  },
                  {
                    "key": "treat-occasion",
                    "value": "treat_occasion"
                  }
                ],
                "location": [
                  {
                    "key": "bathroom-location",
                    "value": "bathroom_location"
                  }
                ]
              }
            ]
          }
        }
      ]
    }
  }
}

В json со следующей схемой:

brandName     String
productName   String 
productLink   String
productType   ?
maleFemale    Male/Female
price         float
unitPrice     String
size          float
ingredients   String
notes         String
numReviews    Int
userIDs       float
locations     float
dates         Date
ages          int
sexes         M/F
ratings       Int
reviews       Array of String
sources       String
characteristics  String
specificRatings  String

Поэтому я попробовал AWS Glue, но не знаю, как избавиться от вложенных данных в виде ключей в начале:

  "data": {
    "products": {
      "items": [
          ...

Действительно, я тестировал модификации на именах:

introducir la descripción de la imagen aquí

Но это, похоже, не имеет никаких последствий, которые я искал, если верить вкладке Preview:

introducir la descripción de la imagen aquí

Я действительно удалил первое и последнее поля soubrayed и изменил остальные, но, похоже, ничего из этого не было учтено в предварительном просмотре.

Действительно, не похоже, что есть хотя бы маппинг на сгенерированный скрипт из задания vsual:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)

# Script generated for node S3 bucket
S3bucket_node1 = glueContext.create_dynamic_frame.from_options(
    format_options={"multiline": False},
    connection_type="s3",
    format="json",
    connection_options={"paths": ["s3://datahubpredicity/JoMalone/"], "recurse": True},
    transformation_ctx="S3bucket_node1",
)

# Script generated for node ApplyMapping
ApplyMapping_node2 = ApplyMapping.apply(
    frame=S3bucket_node1,
    mappings=[("data.products.items", "array", "data.products.items", "array")],
    transformation_ctx="ApplyMapping_node2",
)

# Script generated for node S3 bucket
S3bucket_node3 = glueContext.write_dynamic_frame.from_options(
    frame=ApplyMapping_node2,
    connection_type="s3",
    format="json",
    connection_options={"path": "s3://datahubpredicity/merged/", "partitionKeys": []},
    transformation_ctx="S3bucket_node3",
)

job.commit()

Ответы (0 шт):