Error on running trainMLAgents.py in Docker container on AWS


#1

Hi,
I have followed the docs for training on AWS
And I also added RUN sed -i 's/docker_training=False/docker_training=True/g' trainMLAgents.py to my Dockerfile.

These are the contents of my Dockerfile


RUN apt-get clean && apt-get update && apt-get install -y locales
RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && \
    locale-gen
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
ENV SHELL /bin/bash

RUN apt-get update && \
    apt-get install -y curl bzip2 xvfb ffmpeg git libxrender1

WORKDIR /aaio

RUN curl -o ~/miniconda.sh -O  https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh  && \
     chmod +x ~/miniconda.sh && \
     ~/miniconda.sh -b -p /opt/conda && \
     rm ~/miniconda.sh && \
     /opt/conda/bin/conda clean -ya && \
     /opt/conda/bin/conda create -n python36 python=3.6 numpy

ENV PATH /opt/conda/envs/python36/bin:/opt/conda/envs/bin:$PATH

RUN pip install animalai

COPY agent.py /aaio/agent.py
COPY data /aaio/data

ENV HTTP_PROXY ""
ENV HTTPS_PROXY ""
ENV http_proxy ""
ENV https_proxy ""

########################################################################################################################
# YOUR COMMANDS GO HERE

# For example, if your agent requires the animalai-train library
# you can add the following (remove if not needed):
RUN pip install animalai-train
RUN git clone https://github.com/beyretb/AnimalAI-Olympics.git
RUN pip uninstall --yes tensorflow
RUN pip install tensorflow-gpu==1.14
RUN apt-get install unzip wget
RUN wget https://www.doc.ic.ac.uk/~bb1010/animalAI/env_linux_v1.0.0.zip
RUN mv env_linux_v1.0.0.zip AnimalAI-Olympics/env/
RUN unzip AnimalAI-Olympics/env/env_linux_v1.0.0.zip -d AnimalAI-Olympics/env/
WORKDIR /aaio/AnimalAI-Olympics/examples
RUN sed -i 's/docker_training=False/docker_training=True/g' trainDopamine.py
RUN sed -i 's/docker_training=False/docker_training=True/g' trainMLAgents.py


########################################################################################################################

CMD ["/bin/bash"]

After building the container and trying to run it with
docker run --runtime=nvidia test-training_3 python trainMLAgents.py
I am getting the following error

Mono path[0] = '/aaio/AnimalAI-Olympics/examples/../env/AnimalAI_Data/Managed'
Mono config path = '/aaio/AnimalAI-Olympics/examples/../env/AnimalAI_Data/MonoBleedingEdge/etc'
Preloaded 'libgrpc_csharp_ext.x64.so'
Unable to preload the following plugins:
	ScreenSelector.so
PlayerPrefs - Creating folder: /root/.config/unity3d/Unity Technologies
PlayerPrefs - Creating folder: /root/.config/unity3d/Unity Technologies/Unity Environment
Logging to /root/.config/unity3d/Unity Technologies/Unity Environment/Player.log
trainMLAgents.py:35: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  trainer_config = yaml.load(data_file)
Traceback (most recent call last):
  File "trainMLAgents.py", line 74, in <module>
    env = init_environment(env_path, docker_target_name, no_graphics, worker_id, run_seed)
  File "trainMLAgents.py", line 63, in init_environment
    play=False
  File "/opt/conda/envs/python36/lib/python3.6/site-packages/animalai/envs/environment.py", line 73, in __init__
    aca_params = self.send_academy_parameters(rl_init_parameters_in)
  File "/opt/conda/envs/python36/lib/python3.6/site-packages/animalai/envs/environment.py", line 504, in send_academy_parameters
    return self.communicator.initialize(inputs).rl_initialization_output
  File "/opt/conda/envs/python36/lib/python3.6/site-packages/animalai/envs/rpc_communicator.py", line 79, in initialize
    "The Unity environment took too long to respond. Make sure that :\n"
animalai.envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
	 The environment does not need user interaction to launch
	 The Academy's Broadcast Hub is configured correctly
	 The Agents are linked to the appropriate Brains
	 The environment and the Python interface have compatible versions

Thanks for helping me out.


#2

Hi,

The line you added to the dockerfile attempts to replace a string that is not present in the trainMLAgents.py file. Therefore you never set docker_training=True. Try to replace the line you added with this one:

RUN sed -i 's/docker_training=docker_training/docker_training=True/g' trainMLAgents.py

#3

Thanks ! It worked…