Hello, I'm Victor Sotero

Senior Data Engineer

Databricks | Big Data | AI | Python | SQL | AWS | Spark | Data Lakehouse | MCP

Gothenburg, Sweden

About Me

As a seasoned Data Engineer with over five years of experience developing scalable and reliable big data solutions, I am based in Gothenburg, Sweden, working as a Senior Data Engineer through Software by Quokka with Geely Technology Europe after a Zeekr Technology Europe engagement from April 2025 to May 2026. Software by Quokka is a high-end software engineering firm offering services to leading enterprises in sectors such as automotive, telecom, MedTech, and FinTech.

Previously, at Globant, I played a pivotal role in designing, implementing and optimizing data workflows for major clients, leading to improved data processing efficiency and reduced operational costs. My work supported critical business functions and enabled data-driven decision-making in large-scale enterprises.

Throughout my career, I have handled data for millions of users and processed billions of transactions. With a strong foundation in Python and SQL, coupled with hands-on experience in data modeling, ETL processes, and cluster management, I consistently deliver high-quality, scalable solutions.

4+
Years Experience

1B+
Transactions Processed

5+
Major Enterprises

Skills & Technologies

Data Processing

PySpark 95%

Databricks 90%

Delta Lake 90%

Apache Spark 90%

Hadoop 75%

Hive 80%

Cloud & Infrastructure

AWS (EMR, S3, Redshift) 90%

Airflow 85%

Docker 75%

Kafka 80%

Programming

Python 95%

SQL 95%

TypeScript 70%

Node.js 70%

Data Engineering

ETL/ELT 95%

Change Data Capture 90%

Data Modeling 85%

Data Lakehouse 90%

Experience

Software by Quokka

January 2025 - Present

Senior Data Engineer

Greater Gothenburg Metropolitan Area

High-end software engineering firm offering services to leading enterprises in automotive, telecom, MedTech, and FinTech sectors.

Developed innovative HR solutions including Employee Portal and User Management Service
Created KPI dashboard for employee metrics with integrated feedback services
Built standardized CV generator to streamline employee documentation

TypeScript Node.js React AWS

Geely Technology Europe

May 2026 - Present

Senior Data Engineer (through Software by Quokka)

Gothenburg, Sweden

Geely Technology Europe is Geely Auto Group's unified European R&D organization, integrating engineering hubs across Sweden and Germany after the evolution of Zeekr Technology Europe.

Continuing the EU-integrated data platform work across Geely Technology Europe operations
Evolving Databricks + Unity Catalog data products from PoC and stakeholder demos into production solutions
Designing and orchestrating governed pipelines that ingest, canonicalize, and serve data back to users and markets

PySpark Delta Lake Databricks AWS Redshift MySQL NoSQL

Zeekr Technology Europe

April 2025 - May 2026

Senior Data Engineer (through Software by Quokka)

Gothenburg, Sweden

Zeekr Technology Europe was Geely's Gothenburg-based European R&D organization before becoming part of Geely Technology Europe.

Building an EU-integrated data platform uniting scattered sources into a single Databricks + Unity Catalog layer
Aligning customer-data flows with the EU Data Act compliance requirements
Designing and orchestrating pipelines to ingest, canonicalize, govern, and serve data back to users and markets
Integrating data resources across European operations
Running PoCs and stakeholder demos that are maturing into production solutions

PySpark Delta Lake Databricks AWS Redshift MySQL NoSQL

Globant

October 2022 - January 2025

Consulting engagements

JCPenney

January 2024 - January 2025

Senior Data Engineer (through Globant)

Remote

Major American department store chain with over 650 locations and annual revenues exceeding $11 billion.

Built the entire Marketing Technology data platform in-house, replacing an outsourced solution and significantly reducing data processing costs
Enabled data ownership for JCP by migrating data, optimizing and fixing pipelines and dashboards
Leveraged AWS EMR, Airflow, and Spark capabilities to deliver high-quality datasets for decision-making
Responsible for daily development and maintenance of Big Data processing pipelines on AWS EMR
Ingested data from various sources into Redshift for business intelligence and analytics

PySpark AWS EMR Airflow Redshift Marketing Tech

Adobe

October 2022 - December 2023

Data Engineer (through Globant)

Remote

Adobe Creative Cloud projects, supporting a global software leader serving millions of creative professionals worldwide with annual revenues over $17 billion.

Part of team that migrated terabyte-scale data and processing from on-premises to AWS
Participated in workload migration from on-premises (Hive, Cloudera, Trino, OOZIE) to Cloud (AWS EMR, S3, KDA, Quicksight)
Utilized AWS EMR for processing data with S3 for storage and data lake purposes

AWS EMR S3 Hive PySpark Trino Quicksight

Dadosfera

March 2022 - July 2022

Consulting engagements

Banco Inter

March 2022 - July 2022

Data Engineer (through Dadosfera)

Brazil

Leading digital bank in Brazil with millions of customers.

Migrated SQL procedures to PySpark for scalable processing
Processed data sent to S3 by Kafka's CDC platform in parquet format
Curated and materialized data into Delta Tables
Used Airflow to orchestrate ephemeral EMR cluster deployments

EMR S3 PySpark Kafka Delta Lake Airflow

Ciclic

July 2021 - March 2022

Data Analyst

São Paulo, Brazil

Brazilian financial services company focused on innovative insurance and investment solutions.

Led implementation of data catalog tool enabling self-serve data analytics
Developed ETL pipelines using Airflow from multiple sources (MySQL, RDS, S3, APIs) to Redshift
Conducted data cleansing, transformation, and modeling using dbt
CRM data analysis generated insights achieving 15x higher conversion rate on top campaigns

Airflow Redshift dbt Python

Hamoye.com

July 2020 - December 2020

Data Intern

Remote

Assisted in data cleaning and preprocessing to ensure data quality
Collaborated with data science and engineering teams on ETL pipelines
Performed ad-hoc data analysis and generated reports for business insights
Automated repetitive data tasks through scripting

Ilhasoft Tecnologia (now Weni)

May 2019 - May 2020

NLP Junior Researcher

Brazil

Conducted Natural Language Processing research focusing on chatbot interactions
Developed and evaluated models using Python and NLP libraries
Collaborated on refining conversational flows for Weni's platform

NEES - Núcleo de Excelência em Tecnologias Sociais

June 2017 - May 2019

Graduate Research Assistant

Maceió, Brazil

Published research on International Conference (CSEDU) on the use of Educational Technologies in Medical Education

Featured Projects

EU Data Platform - Zeekr & Geely Technology Europe

Building an EU-integrated data platform that started under Zeekr Technology Europe and continues under Geely Technology Europe, uniting scattered data sources into a single Databricks + Unity Catalog layer and ensuring compliance with the EU Data Act across European operations.

Impact: Enabling data governance and analytics across European operations

Databricks Unity Catalog PySpark Terraform GitHub Actions SQL AWS

Cloud Migration - Adobe

Engineered the migration of terabyte-scale big data workloads from legacy on-premises infrastructure (Cloudera, Hive, Oozie) to a modern AWS cloud architecture. Optimized data pipelines for cost and performance.

Impact: Reduced operational costs and improved scalability for millions of users

AWS EMR S3 Hive PySpark AWS Glue Terraform SQL

Marketing Technology Platform - JCPenney

Built the entire Marketing Technology data platform from scratch, replacing an expensive outsourced solution. Ingested data from various marketing sources into a unified platform.

Impact: Saved significant costs in data processing while enabling full data ownership for 650+ store locations

PySpark AWS EMR Airflow Redshift Python SQL Terraform

CDC Data Platform - Banco Inter

Acted as a Data Engineer on projects within Banco Inter, migrating SQL procedures to PySpark to scale processing for Brazil's leading digital bank. Utilized EMR, S3, PySpark, Kafka, Delta Lake, and Airflow to build robust data pipelines.

Impact: Processing billions of transactions for millions of customers by migrating legacy procedures to scalable PySpark workflows

Kafka Delta Lake PySpark Airflow Spark Streaming AWS EMR

Education & Certifications

Education

Universidade Federal de Alagoas

Bachelor's Degree in Computer Science

Maceió, Brazil

XP Educação

Post-graduate Course in Data Engineering Tools and Services

April 2022 - July 2022

XP Educação

Post-graduate Course in Data Science Tools and Services

May 2021 - July 2021

Certifications

Claude Code in Action

Anthropic

Introduction to Model Context Protocol

Anthropic

Databricks Certified Data Engineer Associate

Databricks

Applied Machine Learning in Python

University of Michigan

Applied Plotting, Charting & Data Representation in Python

University of Michigan

C1 Level Cambridge English Certificate of Advanced English (CAE)

Cambridge Assessment English

Publications

A Systematic Review on the Use of Educational Technologies for Medical Education

CSEDU - International Conference on Computer Supported Education (2019)

Contact

Email victorvcdb@gmail.com

Sweden +460760029014

Brazil +55 82 988524394

LinkedIn Victor Sotero

Location Gothenburg, Sweden