Skip to content

Training ends shortly after entering play #5999

@popcron

Description

@popcron

Describe the bug
When running mlagents-learn (with and without --force) and entering play, it exits almost immediately after and prints Debug.Log calls about 33 times

To Reproduce
Steps to reproduce the behavior:

  1. Start training with the command and press play
  2. Observe it close, with traceback errors in console output

Console logs / stack traces

PS C:\repos\the big game\Saw and UFO> mlagents-learn --force
[W ..\torch\csrc\utils\tensor_numpy.cpp:77] Warning: Failed to initialize NumPy: module compiled against API version 0x10 but this version of numpy is 0xe (function operator ())

            ┐  ╖
        ╓╖╬│╡  ││╬╖╖
    ╓╖╬│││││┘  ╬│││││╬╖
 ╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗
 ╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣
 ╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣
 ╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣
 ╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜
 ╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣
 ╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣
   ╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣
      ╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
          ╙╬╬╬╣╣╣╜
             ╙

 Version information:
  ml-agents: 0.30.0,
  ml-agents-envs: 0.30.0,
  Communicator API: 1.5.0,
  PyTorch: 1.13.1+cpu
[W ..\torch\csrc\utils\tensor_numpy.cpp:77] Warning: Failed to initialize NumPy: module compiled against API version 0x10 but this version of numpy is 0xe (function operator ())
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
[INFO] Connected to Unity environment with package version 3.0.0-exp.1 and communication version 1.5.0
[INFO] Connected new brain: Cell?team=0
[WARNING] Behavior name Cell does not match any behaviors specified in the trainer configuration file. A default configuration will be used.
[WARNING] Deleting TensorBoard data events.out.tfevents.1697750225.pop.26916.0 that was left over from a previous run.
[INFO] Hyperparameters for behavior name Cell:
        trainer_type:   ppo
        hyperparameters:
          batch_size:   1024
          buffer_size:  10240
          learning_rate:        0.0003
          beta: 0.005
          epsilon:      0.2
          lambd:        0.95
          num_epoch:    3
          shared_critic:        False
          learning_rate_schedule:       linear
          beta_schedule:        linear
          epsilon_schedule:     linear
        network_settings:
          normalize:    False
          hidden_units: 128
          num_layers:   2
          vis_encode_type:      simple
          memory:       None
          goal_conditioning_type:       hyper
          deterministic:        False
        reward_signals:
          extrinsic:
            gamma:      0.99
            strength:   1.0
            network_settings:
              normalize:        False
              hidden_units:     128
              num_layers:       2
              vis_encode_type:  simple
              memory:   None
              goal_conditioning_type:   hyper
              deterministic:    False
        init_path:      None
        keep_checkpoints:       5
        checkpoint_interval:    500000
        max_steps:      500000
        time_horizon:   64
        summary_freq:   50000
        threaded:       False
        self_play:      None
        behavioral_cloning:     None
[INFO] Exported results\ppo\Cell\Cell-0.onnx
[INFO] Copied results\ppo\Cell\Cell-0.onnx to results\ppo\Cell.onnx.
Traceback (most recent call last):
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\learn.py", line 264, in main
    run_cli(parse_command_line())
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\learn.py", line 260, in run_cli
    run_training(run_seed, options, num_areas)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\learn.py", line 136, in run_training
    tc.start_learning(env_manager)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\trainer_controller.py", line 175, in start_learning
    n_steps = self.advance(env_manager)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\trainer_controller.py", line 233, in advance
    new_step_infos = env_manager.get_steps()
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\env_manager.py", line 124, in get_steps
    new_step_infos = self._step()
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 408, in _step
    self._queue_steps()
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 302, in _queue_steps
    env_action_info = self._take_step(env_worker.previous_step)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 543, in _take_step
    all_action_info[brain_name] = self.policies[brain_name].get_action(
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 130, in get_action
    run_out = self.evaluate(decision_requests, global_agent_ids)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 93, in evaluate
    masks = self._extract_masks(decision_requests)
  File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 77, in _extract_masks
    mask = torch.as_tensor(
RuntimeError: Could not infer dtype of numpy.int32

Environment (please complete the following information):

  • Unity 2023.3.0a10
  • Windows 11, Torch 1.13.1+cpu, Python 3.10.0, numpy 1.12.1
  • Package source for mlagents and mlagents.extensions are from develop branch (as upm references)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssue describes a potential bug in ml-agents.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions