Reduce docker images size

Rough idea

  • Keep and eyer on layers: (FROM, RUN, COPY).
  • You may check the volume on each step with docker history.

Docker commands

Review temporary files

  • Wrong: 248Mb: use a different layer for each instruction.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    FROM ubuntu
    RUN apt-get update
    RUN apt-get install -y wget xz-utils
    # clean up apt cache
    RUN rm -rf /var/lib/apt/lists/
    RUN wget https://node.org/dist/v12.13.1-linux-x64.tar.xz -O nodejs.tar.xz
    RUN tar xf nodejs.tar.xz
    RUN mkdir /opt/node && cp -r node-v12.13.1-linux-x64/* /opt/node/
    # clean up compressed files
    RUN rm -rf nodejs.tar.zx node-v12.13.-linux-x64
  • Right: 62.7Mb: reduce toa single RUN layer.

    1
    2
    3
    4
    5
    6
    7
    8
    FROM ubuntu
    RUN apt-get update \
    && apt-get install -y wget xz-utils \
    && rm -rf /var/lib/apt/lists/ \
    && wget https://node.org/dist/v12.13.1-linux-x64.tar.xz -0 nodejs.tar.xz \
    && tar xf nodejs.tar.xz \
    && mkdir /opt/node && cp -r node-v12.13.1-linux-x64/* /opt/node/ \
    && rm -rf nodejs.tar.zx node-v12.13.-linux-x64

Avoid using COPY

  • COPY creates a separate layer.
  • You may use wget on the RUN layer.

Use multistage builds

You may create a temporary image and then use it’s filesystem during the build of final image.
It installs all the dependencies and compile all the sources in intermediate image and copy.

  • Wrong: 260.5Mb - many built-in dependencies

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    FROM debian:9.6-slim as compiler

    #install apache thrift dependencies
    RUN apt-get update && apt-get install libstdc++ autoconf gcc
    tk-dev pkg-config libxft-dev build-essential wget curl make -y

    # download and install apache thrift
    RUN curl http://archive.apache.org/dist/thrift/0.11.0/thrift-0,11,0.tar.gz
    | tar zx
    RUN cd thrift-0.1..0/ \
    && ./configure --without-cpp \
    && make \
    && make install \
    && cd.. \
    && rm -rf thrift-0.11.0
  • Right: 55.5Mb

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    ## 1st image: original
    FROM debian:9.6-slim as compiler

    #install apache thrift dependencies
    RUN apt-get update && apt-get install libstdc++ autoconf gcc \
    tk-dev pkg-config libxft-dev build-essential wget curl make -y

    # download and install apache thrift
    RUN curl http://archive.apache.org/dist/thrift/0.11.0/thrift-0,11,0.tar.gz \
    | tar zx
    RUN cd thrift-0.1..0/ \
    && ./configure --without-cpp \
    && make \
    && make install \
    && cd.. \
    && rm -rf thrift-0.11.0

    ## 2nd image on multistage
    FROM debian:9.6-slim

    # copy thrift binary
    COPY --from=compiler /usr/local/bin/thrift /usr/local/bin/thrift

Java case

Use jlink to create an JRE with only the modules you need. Here is a jlink tutorial

1
2
3
4
5
6
7
8
9
10
11
12
13
FROM adoptopenjdk/openjdk11:x86_64-ubuntu-jdk-11.28 as compiler
# create minimal jre
RUN jlink --module-path /opt/java/openjdk/jmods --verbose \
--add-modules java.base,java.logging,java.xml,java.scripting,jdk.jdwp.agent \
--compress 2 \
--no-heaer-files \
--output /opt/jre-11-minimal

# reduces the previous one to almost half of its size
FROM baseline
RUN mkdir -p /usr/lib/jvm
COPY --from=compiler /opt/jre-11minimal /opt/jre
RUN ln -s /opt/jre/bin/java /usr/bin/java

Reduce Java dependencies

Many images have the same dependencies, so they may be moved to another layer.

  • Using jlib, similar strategy to jlink.

    • One line method

      1
      2
      3
      4
      5
      # containerize the application
      mvn compile com.google.cloud.tools:jib-maven-plugin:2.3.0:build \
      -Dimage=<MY IMAGE>
      # build docker daemon
      mvn compile com.google.cloud.tools:jib-maven-plugin:2.3.0:dockerBuild
    • Full configuration method

      1. Setup maven project on pom.xml

        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        16
        <project>
        <build>
        <plugins>
        <plugin>
        <groupId>com.google.cloud.tools</groupId>
        <artifactId>jib-maven-plugin</artifactId>
        <version>2.3.0</version>
        <configuration>
        <to>
        <image>myimage</image>
        </to>
        </configuration>
        </plugin>
        </plugins>
        </build>
        </project>
      2. Change the image value for a valid registry

        1
        2
        3
        4
        5
        6
        <!-- Google Container Registry -->
        <image>gcr.io/my-gcp-project/my-app</image>
        <!-- Amazon Registry Container (ECR) -->
        <image>aws_account_id.dkr.ecr.region.amazonaws.com/my-app</image>
        <!-- Docker hub registry -->
        <image>docker.io/my-docker-id/my-app</image>
      3. Build image

        1
        2
        3
        4
        5
        6
        # build container image
        mvn compile jib:build
        # build to a docker daemon
        mvn compile jib:dockerBuild
        # buil and image tarball
        mvn compile jib:buildTar
  • Manual method: extract dependencies from one project and if they were upated, pushed them to your registry.

    1. On Maven with springboot: add excludeGroupIds parameter to your builder plugin configuration.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      <properties>
      <libraries.excludeGroupIds></libraries.excludeGroupIds>
      </properties>
      <!-- more stuff here -->
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-maven-plugin</artifactId>
      <version>${springboot.version}</version>
      <configuration>
      <finalName>application</finalName>
      <excludeGroupIds>${libraries.excludeGroupids}</excludeGroupIds>
      <classifier>run</classifier>
      <layout>ZIP</layout>
      </configuration>
    2. use bash script.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      # get all dependencies
      EXCLUSIONS=`mvn -o dependecy:list | \
      grep ":.*:.*:.*"` | \
      cut -d] -f2- | \
      sed 's/:.*$//g' | \
      sort -u | \
      paste -d, -s | \
      tr -d "[:blank:]"`
      # build jarfile
      mvn install -Dlibraries.excludeGroupIds=${EXCLUSIONS}
      mvn dependency:copy-dependencies -DincludeScope=runtime \
      -Dmdep.prependGroupId=true -DoutputDirectory='./lib'
    3. Check if it is updated (you may do it checking he md5-hash of pom.xml).

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      DEPENDENCY_HASHES = $(echo `md5sum pom.xml module1/pom.xml \
      module2/pom.xml module3/pom.xml` \
      | md5sum | cut -d ' ' -f 1)

      dep_tag="${DOCKER_REGISTRY}/java-dependencies:${DEPENDENCIES_HASHIES}"
      docker pull ${dep_tag}

      if [ $? -ne 0]; then
      #image does not exist
      cp -r ./lib ./dependencies/
      # this directory contains dependencies.dockerfile
      cd ./dependencies/
      docker build --build-arg -t ${dep-tag}$ -f dependencies.dockerfile .
      docker push ${dep_tag}
      fi
    4. Generate dependencies.dockerfile

      1
      2
      3
      4
      5
      FROM custom-jre-build:jre11

      RUN mkdir /opt/applicaton/ &&
      mkdir /opt/application/lib/
      COPY ./lib /opt/application/lib/
    5. Pass dependencies hash to application’s dockerfile via ARG command and use it in version of FROM-image

      1
      2
      docker build --build-arg version=${DEPENDENCIES_HASHES} \
      -f application.dockerfile .
      1
      2
      3
      4
      5
      6
      7
      8
      9
      # aplication dockerfile
      ARG version
      FROM java-dependencies:$version

      COPY application.jar /opt/application/application.jar
      COPY run_java.sh /opt/erudite/run_java.sh
      RUN chmod +x /opt/erudite/run_java.sh

      CMD ["java", "-jar", "/opt/application/application.jar"]

Python case

Reduce dependencies

  1. Check the packages on dockerfile.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    ENV LANG=C.UTF-8
    ENV CONDA_DIR="/opt/conda"
    ENV PATH="$CONDA_DIR/bin:$PATH"

    RUN apt-get update $$ apt-get install libstdc++ autoconf openssl \
    libxft-dev build-essential make libevent-dev automake flex bison \
    libss11.0 \
    gcc libssl1.0 tk-dev build-essential ca-certificates openssl \
    wget curl libevent-dev automake libtool libbas3 liblacpack3 \
    liblapack-dev libblas-dev \
    libblas3 liblapack3 liblapack-dev libblas-dev bison pkg-config \
    g++ gfortran libpng-dev -y

    RUN pip install --upgrade pip && pip install --upgrade setuptools

    COPY requirements.txt requirements.txt

    # --no-cache-dir is important!
    RUN export LC_ALL=C && pip install --no-cache-dir -r requirements.txt

    FROM baseline

    RUN apt-get unstall g++ -y

    # copy python and installed dependencies
    COPY --from=compiler /opt/miniconda3/ /opt/miniconda3/

    RUN ln -s /opt3/miniconda3/bin/python /usr/bin/python \
    && ln -s /opt/miniconda3/bin/pip /usr/bin/pip

    CMD ["/bin/bash"]
  2. Create the final python image base on dependencies image.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    ARG version
    FROM python-dependencies:$version

    RUN mkdir /opt/application
    COPY scripts/ /opt/application/scripts
    COPY run_python.sh /opt/application/run_python.sh
    RUN chmod +x /opt/application/run_python.sh

    CMD ["/bin/bash"]