Ayoub Fakir.
I build data platforms.
Senior Data Engineer & Architect. Scala · Rust · Go · Python. Distributed systems, functional programming, blockchain.
- Certified Kubernetes Administrator (CKA-2000-008592-0100)
01 About
I'm a Senior Data Engineer & Architect with a decade of building distributed systems, data platforms, and the infrastructure that keeps them honest. I write Scala, Rust, Go, and Python (sometimes Clojure), and I have strong opinions about Functional Programming, blockchain data, and how teams should work with data.
I've led platforms at Décathlon, Glassnode, Algolia, and HPE: moving petabytes, untangling pipelines, and migrating things from where they shouldn't be to where they should. On the side, I teach Data Engineering at Paris-Est Créteil University.
02 Experience
Senior Data Engineer · Décathlon
Led the migration of Decathlon’s data pipelines from Talend/Redshift to Spark/Scala on AWS Databricks.
- Lead Data Engineer on the PerfECO project, moving Talend/Redshift workloads to a Spark/Scala pipeline.
- Data validation at scale (batch + streaming) with Cats / ZIO on POSLog.
- Trained team members on Functional Programming and distributed programming.
- Migrated Spark/Redshift workloads to AWS Databricks.
- Built an agent-based distributed streaming system for ingestion.
- Member of the architecture committee across multiple Décathlon projects.
- Scala
- Spark
- Databricks
- Cats
- ZIO
- AWS
Senior Data Engineer · Glassnode
Led the data platform ingesting and exposing on-chain data from dozens of vendors.
- Ingested blockchain data from CoinGecko, CoinMarketCap, Sonar, Dune, ETF feeds and more.
- Set up the Medallion architecture in GCP, integrated Snowflake and BigQuery.
- Modernized the platform with Airflow (Composer), Spark/Dataproc, dbt.
- Introduced Lakehouse formats (Delta / Iceberg).
- GCP
- Snowflake
- BigQuery
- Airflow
- dbt
- Delta
- Iceberg
Senior Data Engineer · Algolia
Petabytes-monthly ingestion and processing on the Data Platform team.
- Ingestion via Kafka and Kinesis from cloud providers and vendors (Salesforce, …).
- Migrated Stitch → Meltano; framework to automate API/DB ingestions (Airflow + deferred ECS Tasks).
- Spark with EMR and AWS Glue. dbt framework for Analytics Engineers.
- Led Redshift → Databricks/Snowflake feasibility studies and PoCs.
- Lakehouse datalake on Delta Lake + Databricks.
- Kafka
- Kinesis
- Spark
- EMR
- Glue
- dbt
- Delta
- Databricks
Senior DataOps · Air Liquide
5-month missionArchitected and shipped an in-house Airflow data platform on EKS.
- Studied feasibility of an Airflow deployment park on EKS, validated with stakeholders.
- Set up DEV/PROD infrastructure via Terraform; deployments via Kubernetes / Helm.
- Automated cluster chores via the Airflow API.
- Managed cross-team role access; shipped the first dbt / Spark DAGs.
- Airflow
- EKS
- Terraform
- Helm
- dbt
- Spark
Senior Scala / Data Engineer · Hewlett Packard Enterprise
Core Team contributor on the Harmony platform.
- Maintained and extended the Harmony platform; admin of the team’s Kubernetes cluster.
- CI/CD with GitHub Actions and Jenkins (legacy).
- Scala features with ZIO and Cats.
- Co-designed the Complex Event Processing engine from scratch.
- Messaging via Kafka / Pulsar.
- Scala
- ZIO
- Cats
- Kafka
- Pulsar
- Kubernetes
DataOps · HydraDX.io
Infra for Polkadot/Kusama parachains and analytics.
- IaC with Terraform, Consul, Rundeck, Ansible, GitHub Actions.
- Managed the full project infrastructure (Polkadot + Kusama parachains).
- Analytics infra: Scala/Spark + ZIO on EMR; ad-hoc on Zeppelin / JupyterHub.
- Ephemeral testnet deployments for the Runtime team via GitHub Actions + K8s.
- Terraform
- Polkadot
- Kusama
- Spark
- ZIO
- Kubernetes
DataOps & Senior Data Engineer · Alterway Cloud Consulting
Lead data engineer for client audits and infra-as-code engagements.
- IaC for various clients (Terraform, CDK).
- Data infrastructure and job-performance audits.
- Kubernetes ecosystem on AWS.
- Terraform
- CDK
- AWS
- Kubernetes
Senior Data Engineer · Andjaro
Industrialized a full data pipeline; built a data catalog for product teams.
- Audited the existing data infrastructure.
- Pipeline industrialization: Kubernetes, EMR, Airflow, Jenkins, Athena, Glue.
- Spark/Scala cleaning + aggregation jobs.
- Data Catalog implementation for product teams.
- EMR
- Airflow
- Athena
- Glue
- Spark
Senior Data Engineer · Voodoo.io
Daily-TB Spark jobs and a fresh datalake on AWS.
- Spark/Scala cleaning and aggregation across terabytes daily.
- Built a datalake on AWS; Airflow workflows; Kubernetes.
- CI/CD with CircleCI / S3.
- Spark
- AWS
- Airflow
- Kubernetes
- CircleCI
Senior Data Engineer · Société Générale
via DevoteamSpark migration and Kafka/NiFi ingestion.
- Migrated jobs from Spark 1.x to 2.x.
- Ingested data through Kafka and NiFi.
- CI/CD with Jenkins / Nexus.
- Spark
- Kafka
- NiFi
Data Engineer · AXA Data Innovation Lab
via DevoteamCloudera platform admin + GDPR compliance project.
- Spark/Scala cleaning and normalization of AXA’s entities data.
- Cloudera cluster admin as part of the Platform Team.
- Platform KPIs and YARN resource management.
- On-site interventions for internal clients (Hong Kong, Germany, Spain, France).
- GDPR compliance project: deletion / update of users’ sensitive data on request.
- Spark
- Cloudera
- Hadoop
- YARN
Data Consultant · Devoteam Technology
Pre-sales technical lead and internal Big Data trainer.
- Tech / architecture lead on business opportunities.
- Conducted technical interviews.
- Big Data trainer at Devoteam University (Spark, Scala, Python, Hadoop).
- Knowledge Community Leader: internal social-network articles on Big Data.
- Spark
- Scala
- Python
- Hadoop
Data Engineer · Crédit Agricole CIB
HortonWorks Hadoop clusters and PoCs.
- Built and maintained HortonWorks Hadoop clusters.
- Various proofs-of-concept.
- Hadoop
- HortonWorks
03 Skills
Programming
- Scala
- Python
- Go
- Rust
- Clojure
Cloud
- AWS
- GCP
- Azure
Data Platforms
- Databricks
- Snowflake
Data
- Hadoop
- Spark
- Hudi
- Delta
- Iceberg
FP / Frameworks
- ZIO
- Cats
- Cats Effect
NoSQL
- Cassandra
- HBase
- DynamoDB
- MongoDB
CI/CD
- GitHub Actions
- CircleCI
- Jenkins
- GitLab CI
Blockchain
- Polkadot
- Hyperledger
- Ethereum
04 Education
05 Beyond Code
B23 Dossier de Vol
✈ AviationA Streamlit app to prep VFR flights on the Bristell B23 (F-HBTI / F-HRDV). Reproduces the 9 sections of an ACAF flight dossier: weather, NOTAM, navlog with wind triangle, performance interpolation, fuel planning, weight & balance with envelope checks, and PDF export. Built it because I'm training as a pilot at Aéroclub Air France and wanted prep to be faster.
- Python
- Streamlit
- pandas
- matplotlib
- fpdf2