[EN] Building a CI/CD pipeline for a Spark project using Github Actions, SBT and AWS S3 — Part 2
Last updated: Jul 1, 2021
In the first article of this series, we talked about how we can set up a CI/CD pipeline for a Spark project using Github Actions, SBT as a build tool and S3 for deployment. Our code once pushed to the [master] branch of our project on Github, triggered an SBT Build command to generate a fat jar, then pushed it to S3 to the chosen bucket.
However, this pipeline still lacks a way to add a logic since it does not allow us to check whether the jar’s version we’re putting to S3 already exists for instance.
To add a similar logic (or expand it if necessary), we have to create our own Github Action. This article aims to show you step by step how we can write a custom Github Action (and publish it!) for our previous CI/CD.
In the official documentation, we see that we can create several types of Github actions; the one that is of interest to us for this article is through Creating a Docker container action.
The first step is to create a Dockerfile that will allow us to run AWS commands through the AWS CLI (also note that we can use boto3 as well for more complex actions like creating an EMR Cluster and running the newly-deployed jar).
In our example, we will check whether the Jar we want to upload already exists in our S3 bucket and if yes, stop the CI part of the pipeline and don’t upload the new jar that we build.
First, we need a Dockerfile that will look like the following:
FROM python:3.7-alpine | |
LABEL "com.github.actions.name"="Jar to S3 with SBT Build" | |
LABEL "com.github.actions.description"="Generate a Jar File with SBT from a Scala / Spark project and upload it to S3 with conditions" | |
LABEL "com.github.actions.icon"="copy" | |
LABEL "com.github.actions.color"="green" | |
LABEL version="0.1.0" | |
LABEL repository="https://github.com/<GithubUsername>/RepositoryName" | |
LABEL homepage="fakir.dev" | |
LABEL maintainer="Ayoub Fakir <ayoub@fakir.dev>" | |
ENV AWSCLI_VERSION='1.16.232' | |
RUN pip install --quiet --no-cache-dir awscli==${AWSCLI_VERSION} | |
ADD entrypoint.sh /entrypoint.sh | |
ENTRYPOINT ["/entrypoint.sh"] |
Second, an entrypoint.sh script will help us implement our logic:
#!/bin/sh | |
set -e | |
mkdir -p ~/.aws | |
touch ~/.aws/credentials | |
echo "[default] | |
aws_access_key_id = ${AWS_ACCESS_KEY_ID} | |
aws_secret_access_key = ${AWS_SECRET_ACCESS_KEY}" > ~/.aws/credentials | |
check_existence = `aws s3 cp s3://${S3_BUCKET}/ - | grep jar_name_and_version.jar` | |
copy_command = `aws s3 cp ${FILE} s3://${AWS_S3_BUCKET} \ | |
--region ${AWS_REGION} $*` | |
if [ -z "$check_existence" ] | |
then | |
$copy_command | |
else | |
echo "Jar already exists"; | |
exit 1; | |
fi | |
rm -rf ~/.aws |
Our entrypoint.sh script will check whether the jar exists in the S3 bucket, then run a simple if/else logic. Of course, our AWS Access Key and Secret Access would be stored in Github’s secrets, as we saw in part 1 of this series.
Then, we can create our workflow.yml file as merely as the one we created in the previous article while changing the path to our Github repository since it’s our own Github Action.
name: CI CD | |
on: | |
push: | |
branches: [ master ] | |
jobs: | |
build: | |
runs-on: ubuntu-latest | |
steps: | |
- uses: actions/checkout@v2 | |
- name: Set up JDK 1.8 | |
uses: actions/setup-java@v1 | |
with: | |
java-version: 1.8 | |
- name: Build Fat Jar | |
run: sbt assembly | |
- name: S3 Upload Jar | |
uses: <githubUserName>/name-of-your-github-action@master | |
env: | |
AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}/jar/ | |
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} | |
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} | |
AWS_REGION: 'eu-west-3' | |
FILE: 'target/scala-2.11/My_Spark_Project-*.jar' |
All set! Finally, it’s better to create a README.MD file to let people know how to use your action.
Then, we push everything to our repository while tagging our release:
git add .
git commit -m "My Awesome Github Action"
git tag -a -m "First version OK" v1
git push --follow-tags
Voilà! If you’re happy with the Github Action you just created, you can publish it to the official Github Marketplace. Github shows us a step-by-step how-to here.