Résumé

Ayoub Fakir

Data Engineer & Data Architect, Scala / Rust / Go / Python developer, Distributed and Blockchain Systems.

Certified Kubernetes Administrator (CKA-2000-008592-0100)

Functional Data Modeling Enthusiast

[ ayoub[at]fakir.dev ]

Experience and Projects

Senior Data Engineer, Décathlon

Joined as a Lead Data Engineer part of the PerfECO project to move from Talend and Redshift to a Spark/Scala based Data Pipeline.
Data validation at scale (batch and streaming) using FP Techniques and Frameworks (Cats/ZIO) => POSLog.
Train team members to get up to speed with Functional Programming in Scala and Distributed Programming.
Migration of the whole Spark / Redshift workloads to AWS Databricks.
Work closely with the “Data Factory” team, to improve the overall quality of Decathlon's Data Pipelines and Data Architectures accross teams.
Implementation of an agent-based distributed streaming system for Data Ingestion.
Part of the architecture committee for several other projects within Décathlon.

Senior Data Engineer, Glassnode

Leading the Data Platform efforts to ingest, process and expose Blockchain Data from different providers.
Ingestion of blockchain data from various vendors (Coingecko, CoinmarketCap, Sonar, Dune, …)
Ingestion of ETF Data and general digital assets market data.
Setup of the Medaillon (layered) architecture in GCP.
Integration of Snowflake and BigQuery.
Restructuring of Glassnode's data platform to use modern data tools: Airflow (Composer), Spark/Dataproc, DBT.
Introduction of Lakehouse formats (Delta/Iceberg)

Senior Data Engineer, Algolia

Part of the Data Engineering and Data Platform team, central to the whole Algolia organization, and ingesting/serving Petabytes of Data monthly.
Ingestion of Data using Kafka and Kinesis, through various sources, including external Cloud Providers, vendors (Salesforce).
Migration from Stitch to Meltano and creation of a framework to automate API and Databases ingestions (orchestrated by Airflow and ran as deferred through ECS Tasks)
Data Processing using Spark with EMR and AWS Glue.
Implementation of a framework to be used by Analytics Engineers leveraging DBT.
Study of the migration from Redshift to Databricks and Snowflake (communication with both companies, leading the PoCs and feasability).
Lakehouse datalake using Deltalake and Databricks.

Senior Data Ops, AirLiquide

Short 5 months mission to architect and implement an Airflow Data Platform in-house:

Study the possibility of having a deployment park for Airflow Clusters on EKS.
Set up the architecture and validate it with the stakeholders.
Study the integrability of the solution within the Airliquide IS.
Set up the infrastructure in DEV / PROD environments via Terraform.
Automate the deployment of Airflow clusters via Kubernetes / HELM.
Automate chores via the Airflow API.
Manage role access between Airflow / EKS and the accounts of external initiatives teams.
Implement the first DBT / Spark DAGs in the Airflow environments.

Senior Scala/Data Engineer, Hewlett Packard Enterprise

Part of the Harmony team, working as a contributor as part the Core Team, to work and maintain the Harmony Platform.
Management and administration of the team's Kubernetes Cluster.
Add features to the overall architecture of Harmony, to continue support internal partners and speed up the deployment of their pipelines through Harmony.
CI/CD using Github Actions and Jenkins (legacy)
Add new features to harmony using Scala with ZIO and Cats frameworks.
Part of the team that worked on the development and design from scratch of the Complex Event Processing engine.
Messaging using Kafka/Pulsar

DataOps, HydraDX.io

Infra As Code (Terraform, Consul, Rundeck, Ansible, Github Actions).
Management and setup of the whole infrastructure of the project (Polkadot and Kusama Parachains).
Setup of the Analytics Infrastructure (Scala / Spark Jobs for Data Prep, using ZIO - EMR for running workloads and AdHoc analysis using Zeppelin/JupyterHub).
Automated and ephemeral testnet deployments for the Runtime team using Github Actions and Kubernetes.

DataOps and Senior Data Engineer, Alterway Cloud Consulting

Infra As Code for different clients (Terraform, CDK).
Lead Data Engineer for Data Infrastructure and Jobs Performance Audits.
Devops (Kubernetes Ecosystem, AWS).

SENIOR DATA ENGINEER, ANDJARO

Audit of the existing Data Infrastructure.
Deployment and Industrialisation of a full Data Pipeline: Kubernetes, EMR, Airflow, Jenkins, Athena, AWS Glue.
Spark / Scala cleaning and aggregation jobs.
Implementation of a Data Catalog for Product teams.

SENIOR DATA ENGINEER, VOODOO.IO

Spark / Scala jobs for cleaning and aggregating terabytes of data daily.
Building a Datalake on AWS.
Airflow workflows.
Kubernetes.
CI / CD with CircleCI / S3.

SENIOR DATA ENGINEER, Société Générale (Devoteam Technology Consulting)

Migrating jobs from Spark 1.x to Spark 2.x.
Ingesting Data through Kafka and NiFi.
CI / CD with Jenkins / Nexus.

DATA ENGINEER, AXA Data Innovation Lab (Devoteam Technology Consulting)

Spark / Scala jobs for cleaning and normalizing Axa's entities data.
Administration of a Cloudera Cluster as part of the Platform Team.
CI / CD with Jenkins / Nexus.
Jobs for calculating platform's KPIs and managing YARN's resources.
On-Site intervention for internal clients (Hong-Kong, Germany, Spain, France).
GDPR Project
- A GDPR compliance Project for deleting / updating user's sensitive data when requested.

Data Consultant, Devoteam Technology

Responding to Business Opportunities from a technical / architecture view.
Conducting technical interviews for new candidates.
Big Data trainer internally for Devoteam University (Spark, Scala, Python, Hadoop).
Knowledge Community Leader: producing articles and content in our internal social network about the Big Data topic.

Data Engineer, Crédit Agricole CIB

Building and Maintaining HortonWorks Hadoop Clusters.
Various Proof of Concepts.

Education

Master's Degree, Paris 12 University (2015)

Engineering of Distributed Systems.

Skills

Programming: Scala, Python, Go, Clojure (and whatever language I need to manipulate, really, excluding JavaScript!)

Cloud Providers: AWS, GCP, Azure

Data Platforms and Warehousing: Databricks, Snowflake

NoSQL: Cassandra, HBase, DynamoDB, MongoDB

Data: Hadoop, Spark, Hudi, Delta

CI: CircleCI, Jenkins, GitlabCI, GithubActions

Versioning: Git

Blockchain: Polkadot Ecosystem, HyperLedger, Ethereum

Awards

Odyssey Hackathon Winner, https://www.youtube.com/watch?v=ZfQoCk4kq3Y
Morocco Web Awards Winner, https://kezakoo.com

Teaching

Paris Est Créteil University: Data Engineering