Résumé on Ayoub Fakir

Résumé on Ayoub Fakir/Recent content in Résumé on Ayoub FakirHugo -- gohugo.ioen-usWed, 02 Feb 2022 18:27:58 +0100About/about/Fri, 09 Apr 2021 00:00:00 +0000/about/Hi! I'm Ayoub, a Senior Data Engineer absolutely passionate about data technologies. I mainly work on Distributed Systems (Hadoop / Kubernetes / Cloud Technologies), Functional Programming (Haskell / Scala / Clojure), Rust, and Blockchain Technologies (Ethereum / Bitcoin / Hyperledger, and more recently got into the Polkadot ecosystem). I do consulting as well as teaching (Paris 12 University). You can contact me to say Hi, to talk about your projects or to hire me: ayoub[at]fakir.[FR] Passer de EMR vers Kubernetes pour les workloads Spark/post/migrer-emr-vers-kubernetes/Thu, 18 Feb 2021 04:26:07 +0200/post/migrer-emr-vers-kubernetes/Introduction AWS EMR est un service AWS largement utilisé principalement pour le traitement des données massives avec Apache Spark dans un Cluster Hadoop dédié. Au-delà de sa fonction principale, EMR embarque un bon nombre d'outils open-source, certains pour le monitoring (Ganglia), et d'autres pour le requêtage des données (Hive). Plus d'informations peuvent être trouvées par ici. Dépendamment du contexte, EMR peut être utilisé soit en tant qu'instance d'un cluster éphémère (par exemple en lançant un Cluster tous les 6 heures pour exécuter des jobs Spark), soit en tant que cluster permanent.[EN] Migrating from a plain Spark Application to ZparkIO/post/migrating-to-zparkio/Fri, 16 Oct 2020 10:36:00 +0200/post/migrating-to-zparkio/Migrating from a plain Spark Application to ZIO with ZparkIO In this article, we'll see how you can migrate your Spark Application into ZIO and ZparkIO, so you can benefit from all the wonderful features that ZIO offers and that we'll be discussing. What is ZIO? ZIO is defined, according to official documentation as a library for asynchronous and concurrent programming that is based on pure functional programming. In other words, ZIO helps us write code with type-safe, composable and easily testable code, all by using safe and side-effect-free code.[EN] Building a CI/CD pipeline for a Spark project using Github Actions, SBT and AWS S3 — Part 2/post/ci-cd-sbt-s3-github-actions-p2/Wed, 29 Apr 2020 13:01:24 +0200/post/ci-cd-sbt-s3-github-actions-p2/In the first article of this series, we talked about how we can set up a CI/CD pipeline for a Spark project using Github Actions, SBT as a build tool and S3 for deployment. Our code once pushed to the [master] branch of our project on Github, triggered an SBT Build command to generate a fat jar, then pushed it to S3 to the chosen bucket. However, this pipeline still lacks a way to add a logic since it does not allow us to check whether the jar’s version we’re putting to S3 already exists for instance.[EN] CI/CD pipeline using Github Actions, SBT and AWS S3 - Part 1/post/ci-cd-sbt-s3-github-actions/Wed, 08 Apr 2020 04:35:59 +0200/post/ci-cd-sbt-s3-github-actions/Github now allows us to build continuous integration and continuous deployment workflows for our Github Repositories thanks to Github Actions, for almost all Github plans. In this tutorial, we’re going to go through building a CI/CD pipeline based on a Scala / Spark project. We will be using SBT, the Scala Build Tool, which will allow us to get a jar that we’re then going to deploy to AWS S3 using a custom Github Action.[EN] On Minimalistic Teaching/post/on-minimalistic-teaching/Thu, 06 Feb 2020 10:00:37 +0200/post/on-minimalistic-teaching/First… The education system today is experiencing a lot of challenges and has many issues around the world, and at all levels. That said, the “education problem” being a huge subject, we can only solve it by addressing small problems, one at a time, and the sum of all of these solutions may lead us to solving the bigger issue. For instance, one of the issues in higher education is that a teacher is either academic or professional; the former has a theoretical focus ― and does not teach students how to tackle real world problems based on what she teaches them ―, whereas the latter is more focused on practical applications ― and might not have the pedagogical tools or know-how.[FR] Le Bitcoin Expliqué à ma mère/post/bitcoin-explique-a-ma-mere-fr/Thu, 08 Nov 2018 04:26:07 +0200/post/bitcoin-explique-a-ma-mere-fr/Aujourd’hui, ma mère me parle du fait que l’un de ses élèves en école primaire lui ait parlé d’une grande « révolution » nommée Bitcoin. « Mais c’est quoi ce truc qui va tuer les banques ?» s’est-elle étonnée. C’est pour cette raison même que j’ai décidé d’expliquer le Bitcoin à travers cet article à ma mère, ainsi que toutes les mamans qui pourraient consulter cet article ! Vois-tu, maman, une grande majorité de ceux qui connaissent les principes derrière le Bitcoin sont des geeks anti-sociaux qui ne parlent que binaire, et dénigrent tous ceux qui ne le connaissent pas ; j’en ai fait partie à une époque, avant de me rendre compte que la cravate, ça m’allait bien aussi ![EN] 10+ Great Books for Functional Programming in Scala/post/10-books-scala/Fri, 17 Mar 2017 05:47:36 +0200/post/10-books-scala/This article was co-authored by Matthew Rathbone image by Thomas Leuthard James Gosling, creator of Java, said: “If I were to pick a language to use today other than Java, it would be Scala.” Scala is a hot language in software development today, it is used by a range of start-ups for application development and has been adopted as the unofficial language of big data software development thanks to frameworks like Spark.Why combine asynchronous and distributed calculations to tackle the biggest data quality challenges/post/asynchronous-calculations-zio/Fri, 17 Mar 2017 05:47:36 +0200/post/asynchronous-calculations-zio/Article co-authored by Martin Delobel and available on Medium.[EN] 10+ Great Books for Apache Spark/post/10-books-spark/Fri, 13 Jan 2017 05:45:12 +0200/post/10-books-spark/This article was co-authored by Matthew Rathbone image by Ed Robertson Apache Spark is a super useful distributed processing framework that works well with Hadoop and YARN. Many industry users have reported it to be 100x faster than Hadoop MapReduce for in certain memory-heavy tasks, and 10x faster while processing data on disk. While Spark has incredible power, it is not always easy to find good resources or books to learn more about it, so I thought I’d compile a list.[EN] The Truth Behind the Bigdata Buzz Word/post/truth-behind-bigdata-buzz-word/Mon, 10 Oct 2016 04:27:41 +0200/post/truth-behind-bigdata-buzz-word/Big Data…Really? Few years ago, I had a discussion with a mentor of mine about the career path I wanted to pursue, and I said: “Look, Big Data is something really great, and I want to become a Big Data Engineer later on!", and his answer was: “Okay, but be cautious, Big Data is not a revolution, and just like the “Cloud”, marketers have done their jobs”. I didn't trust his words back then, and… You bet!