Résumé
Ayoub Fakir
Data Engineer & Data Architect, Scala / Rust / Go / Python developer, Distributed and Blockchain Systems.
Certified Kubernetes Administrator (CKA-2000-008592-0100)
Functional Data Modeling Enthusiast
[ ayoub[at]fakir.dev ]
Senior Data Engineer, Décathlon
- Joined as a Lead Data Engineer part of the PerfECO project to move from Talend and Redshift to a Spark/Scala based Data Pipeline.
- Data validation at scale (batch and streaming) using FP Techniques and Frameworks (Cats/ZIO) => POSLog.
- Train team members to get up to speed with Functional Programming in Scala and Distributed Programming.
- Migration of the whole Spark / Redshift workloads to AWS Databricks.
- Work closely with the “Data Factory” team, to improve the overall quality of Decathlon's Data Pipelines and Data Architectures accross teams.
- Implementation of an agent-based distributed streaming system for Data Ingestion.
- Part of the architecture committee for several other projects within Décathlon.
Senior Data Engineer, Glassnode
- Leading the Data Platform efforts to ingest, process and expose Blockchain Data from different providers.
- Ingestion of blockchain data from various vendors (Coingecko, CoinmarketCap, Sonar, Dune, …)
- Ingestion of ETF Data and general digital assets market data.
- Setup of the Medaillon (layered) architecture in GCP.
- Integration of Snowflake and BigQuery.
- Restructuring of Glassnode's data platform to use modern data tools: Airflow (Composer), Spark/Dataproc, DBT.
- Introduction of Lakehouse formats (Delta/Iceberg)
Senior Data Engineer, Algolia
- Part of the Data Engineering and Data Platform team, central to the whole Algolia organization, and ingesting/serving Petabytes of Data monthly.
- Ingestion of Data using Kafka and Kinesis, through various sources, including external Cloud Providers, vendors (Salesforce).
- Migration from Stitch to Meltano and creation of a framework to automate API and Databases ingestions (orchestrated by Airflow and ran as deferred through ECS Tasks)
- Data Processing using Spark with EMR and AWS Glue.
- Implementation of a framework to be used by Analytics Engineers leveraging DBT.
- Study of the migration from Redshift to Databricks and Snowflake (communication with both companies, leading the PoCs and feasability).
- Lakehouse datalake using Deltalake and Databricks.
Senior Data Ops, AirLiquide
Short 5 months mission to architect and implement an Airflow Data Platform in-house:
- Study the possibility of having a deployment park for Airflow Clusters on EKS.
- Set up the architecture and validate it with the stakeholders.
- Study the integrability of the solution within the Airliquide IS.
- Set up the infrastructure in DEV / PROD environments via Terraform.
- Automate the deployment of Airflow clusters via Kubernetes / HELM.
- Automate chores via the Airflow API.
- Manage role access between Airflow / EKS and the accounts of external initiatives teams.
- Implement the first DBT / Spark DAGs in the Airflow environments.
Senior Scala/Data Engineer, Hewlett Packard Enterprise
- Part of the Harmony team, working as a contributor as part the
Core Team
, to work and maintain the Harmony Platform. - Management and administration of the team's Kubernetes Cluster.
- Add features to the overall architecture of Harmony, to continue support internal partners and speed up the deployment of their pipelines through Harmony.
- CI/CD using Github Actions and Jenkins (legacy)
- Add new features to harmony using Scala with ZIO and Cats frameworks.
- Part of the team that worked on the development and design from scratch of the Complex Event Processing engine.
- Messaging using Kafka/Pulsar
DataOps, HydraDX.io
- Infra As Code (Terraform, Consul, Rundeck, Ansible, Github Actions).
- Management and setup of the whole infrastructure of the project (Polkadot and Kusama Parachains).
- Setup of the Analytics Infrastructure (Scala / Spark Jobs for Data Prep, using ZIO - EMR for running workloads and AdHoc analysis using Zeppelin/JupyterHub).
- Automated and ephemeral testnet deployments for the Runtime team using Github Actions and Kubernetes.
DataOps and Senior Data Engineer, Alterway Cloud Consulting
- Infra As Code for different clients (Terraform, CDK).
- Lead Data Engineer for Data Infrastructure and Jobs Performance Audits.
- Devops (Kubernetes Ecosystem, AWS).
SENIOR DATA ENGINEER, ANDJARO
- Audit of the existing Data Infrastructure.
- Deployment and Industrialisation of a full Data Pipeline: Kubernetes, EMR, Airflow, Jenkins, Athena, AWS Glue.
- Spark / Scala cleaning and aggregation jobs.
- Implementation of a Data Catalog for Product teams.
SENIOR DATA ENGINEER, VOODOO.IO
- Spark / Scala jobs for cleaning and aggregating terabytes of data daily.
- Building a Datalake on AWS.
- Airflow workflows.
- Kubernetes.
- CI / CD with CircleCI / S3.
SENIOR DATA ENGINEER, Société Générale (Devoteam Technology Consulting)
- Migrating jobs from Spark 1.x to Spark 2.x.
- Ingesting Data through Kafka and NiFi.
- CI / CD with Jenkins / Nexus.
DATA ENGINEER, AXA Data Innovation Lab (Devoteam Technology Consulting)
- Spark / Scala jobs for cleaning and normalizing Axa's entities data.
- Administration of a Cloudera Cluster as part of the Platform Team.
- CI / CD with Jenkins / Nexus.
- Jobs for calculating platform's KPIs and managing YARN's resources.
- On-Site intervention for internal clients (Hong-Kong, Germany, Spain, France).
- GDPR Project
- A GDPR compliance Project for deleting / updating user's sensitive data when requested.
Data Consultant, Devoteam Technology
- Responding to Business Opportunities from a technical / architecture view.
- Conducting technical interviews for new candidates.
- Big Data trainer internally for Devoteam University (Spark, Scala, Python, Hadoop).
- Knowledge Community Leader: producing articles and content in our internal social network about the Big Data topic.
Data Engineer, Crédit Agricole CIB
- Building and Maintaining HortonWorks Hadoop Clusters.
- Various Proof of Concepts.
Master's Degree, Paris 12 University (2015)
- Engineering of Distributed Systems.
Programming: Scala, Python, Go, Clojure (and whatever language I need to manipulate, really, excluding JavaScript!)
Cloud Providers: AWS, GCP, Azure
Data Platforms and Warehousing: Databricks, Snowflake
NoSQL: Cassandra, HBase, DynamoDB, MongoDB
Data: Hadoop, Spark, Hudi, Delta
CI: CircleCI, Jenkins, GitlabCI, GithubActions
Versioning: Git
Blockchain: Polkadot Ecosystem, HyperLedger, Ethereum
- Odyssey Hackathon Winner, https://www.youtube.com/watch?v=ZfQoCk4kq3Y
- Morocco Web Awards Winner, https://kezakoo.com
- Paris Est Créteil University: Data Engineering