-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Closed
Labels
bugIssue describes a potential bug in ml-agents.Issue describes a potential bug in ml-agents.
Description
Describe the bug
When running mlagents-learn
(with and without --force) and entering play, it exits almost immediately after and prints Debug.Log calls about 33 times
To Reproduce
Steps to reproduce the behavior:
- Start training with the command and press play
- Observe it close, with traceback errors in console output
Console logs / stack traces
PS C:\repos\the big game\Saw and UFO> mlagents-learn --force
[W ..\torch\csrc\utils\tensor_numpy.cpp:77] Warning: Failed to initialize NumPy: module compiled against API version 0x10 but this version of numpy is 0xe (function operator ())
┐ ╖
╓╖╬│╡ ││╬╖╖
╓╖╬│││││┘ ╬│││││╬╖
╖╬│││││╬╜ ╙╬│││││╖╖ ╗╗╗
╬╬╬╬╖││╦╖ ╖╬││╗╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╜╜╜ ╟╣╣
╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╒╣╣╖╗╣╣╣╗ ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖ ╣╣╣
╬╬╬╬┐ ╙╬╬╬╬│╓╣╣╣╝╜ ╫╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣ ╣╣╣ ╙╟╣╣╜╙ ╫╣╣ ╟╣╣
╬╬╬╬┐ ╙╬╬╣╣ ╫╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╟╣╣╬ ╣╣╣ ╣╣╣ ╟╣╣ ╣╣╣┌╣╣╜
╬╬╬╜ ╬╬╣╣ ╙╝╣╣╬ ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬ ╣╣╣ ╣╣╣ ╟╣╣╦╓ ╣╣╣╣╣
╙ ╓╦╖ ╬╬╣╣ ╓╗╗╖ ╙╝╣╣╣╣╝╜ ╘╝╝╜ ╝╝╝ ╝╝╝ ╙╣╣╣ ╟╣╣╣
╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝ ╫╣╣╣╣
╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
╙╬╬╬╣╣╣╜
╙
Version information:
ml-agents: 0.30.0,
ml-agents-envs: 0.30.0,
Communicator API: 1.5.0,
PyTorch: 1.13.1+cpu
[W ..\torch\csrc\utils\tensor_numpy.cpp:77] Warning: Failed to initialize NumPy: module compiled against API version 0x10 but this version of numpy is 0xe (function operator ())
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
[INFO] Connected to Unity environment with package version 3.0.0-exp.1 and communication version 1.5.0
[INFO] Connected new brain: Cell?team=0
[WARNING] Behavior name Cell does not match any behaviors specified in the trainer configuration file. A default configuration will be used.
[WARNING] Deleting TensorBoard data events.out.tfevents.1697750225.pop.26916.0 that was left over from a previous run.
[INFO] Hyperparameters for behavior name Cell:
trainer_type: ppo
hyperparameters:
batch_size: 1024
buffer_size: 10240
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
shared_critic: False
learning_rate_schedule: linear
beta_schedule: linear
epsilon_schedule: linear
network_settings:
normalize: False
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
deterministic: False
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
network_settings:
normalize: False
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
deterministic: False
init_path: None
keep_checkpoints: 5
checkpoint_interval: 500000
max_steps: 500000
time_horizon: 64
summary_freq: 50000
threaded: False
self_play: None
behavioral_cloning: None
[INFO] Exported results\ppo\Cell\Cell-0.onnx
[INFO] Copied results\ppo\Cell\Cell-0.onnx to results\ppo\Cell.onnx.
Traceback (most recent call last):
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\learn.py", line 264, in main
run_cli(parse_command_line())
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\learn.py", line 260, in run_cli
run_training(run_seed, options, num_areas)
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\learn.py", line 136, in run_training
tc.start_learning(env_manager)
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\trainer_controller.py", line 175, in start_learning
n_steps = self.advance(env_manager)
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\trainer_controller.py", line 233, in advance
new_step_infos = env_manager.get_steps()
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\env_manager.py", line 124, in get_steps
new_step_infos = self._step()
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 408, in _step
self._queue_steps()
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 302, in _queue_steps
env_action_info = self._take_step(env_worker.previous_step)
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 543, in _take_step
all_action_info[brain_name] = self.policies[brain_name].get_action(
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 130, in get_action
run_out = self.evaluate(decision_requests, global_agent_ids)
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 93, in evaluate
masks = self._extract_masks(decision_requests)
File "C:\Users\phill\AppData\Local\Programs\Python\Python310\lib\site-packages\mlagents\trainers\policy\torch_policy.py", line 77, in _extract_masks
mask = torch.as_tensor(
RuntimeError: Could not infer dtype of numpy.int32
Environment (please complete the following information):
- Unity 2023.3.0a10
- Windows 11, Torch 1.13.1+cpu, Python 3.10.0, numpy 1.12.1
- Package source for mlagents and mlagents.extensions are from develop branch (as upm references)
migguli and PatrickM92
Metadata
Metadata
Assignees
Labels
bugIssue describes a potential bug in ml-agents.Issue describes a potential bug in ml-agents.