Skip to content

Instantly share code, notes, and snippets.

@jlebar
Created September 8, 2019 04:06
Show Gist options
  • Select an option

  • Save jlebar/fcb502278cc03019238e677dc6c25896 to your computer and use it in GitHub Desktop.

Select an option

Save jlebar/fcb502278cc03019238e677dc6c25896 to your computer and use it in GitHub Desktop.
WARNING: Logging before flag parsing goes to stderr.
W0907 21:06:26.900039 139893622470464 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W0907 21:06:28.349731 139893622470464 deprecation_wrapper.py:119] From /usr/local/google/home/jlebar/code/tensor2tensor/tensor2tensor/utils/expert_utils.py:68: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
W0907 21:06:29.789397 139893622470464 deprecation_wrapper.py:119] From /usr/local/google/home/jlebar/code/tensor2tensor/tensor2tensor/trax/jaxboard.py:38: The name tf.HistogramProto is deprecated. Please use tf.compat.v1.HistogramProto instead.
W0907 21:06:29.789571 139893622470464 deprecation_wrapper.py:119] From /usr/local/google/home/jlebar/code/tensor2tensor/tensor2tensor/trax/jaxboard.py:39: The name tf.Summary is deprecated. Please use tf.compat.v1.Summary instead.
W0907 21:06:29.789673 139893622470464 deprecation_wrapper.py:119] From /usr/local/google/home/jlebar/code/tensor2tensor/tensor2tensor/trax/jaxboard.py:40: The name tf.SummaryMetadata is deprecated. Please use tf.compat.v1.SummaryMetadata instead.
W0907 21:06:29.800125 139893622470464 deprecation_wrapper.py:119] From /usr/local/google/home/jlebar/code/tensor2tensor/tensor2tensor/trax/trainer.py:99: The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead.
2019-09-07 21:06:29.865873: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
I0907 21:06:29.866058 139893622470464 trax.py:139] Using --output_dir /usr/local/google/tmp/trax
Using --output_dir /usr/local/google/tmp/trax
I0907 21:06:30.378094 139893622470464 dataset_builder.py:184] Overwrite dataset info from restored data version.
I0907 21:06:30.382681 139893622470464 dataset_builder.py:184] Overwrite dataset info from restored data version.
I0907 21:06:30.384521 139893622470464 dataset_builder.py:253] Reusing dataset mnist (/usr/local/google/tmp/trax/mnist/1.0.0)
I0907 21:06:30.384707 139893622470464 dataset_builder.py:399] Constructing tf.data.Dataset for split train, from /usr/local/google/tmp/trax/mnist/1.0.0
W0907 21:06:30.437631 139893622470464 deprecation.py:323] From /usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/tensorflow/python/data/util/random_seed.py:58: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2019-09-07 21:06:30.478029: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference_Dataset_flat_map_read_one_file_25
2019-09-07 21:06:30.478126: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference_Dataset_map_ExampleParser.parse_example_39
2019-09-07 21:06:30.537407: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_gif_true_89
2019-09-07 21:06:30.537503: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_gif_false_90
2019-09-07 21:06:30.543452: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_png_true_78
2019-09-07 21:06:30.543508: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_png_false_79
2019-09-07 21:06:30.543560: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_gif_true_89
2019-09-07 21:06:30.543624: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_gif_false_90
2019-09-07 21:06:30.550423: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: decode_image_cond_jpeg_true_59
2019-09-07 21:06:30.550512: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: decode_image_cond_jpeg_false_60
2019-09-07 21:06:30.550574: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_png_true_78
2019-09-07 21:06:30.550607: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_png_false_79
2019-09-07 21:06:30.550649: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_gif_true_89
2019-09-07 21:06:30.550708: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_gif_false_90
I0907 21:06:30.553653 139893622470464 dataset_builder.py:184] Overwrite dataset info from restored data version.
I0907 21:06:30.556012 139893622470464 dataset_builder.py:253] Reusing dataset mnist (/usr/local/google/tmp/trax/mnist/1.0.0)
I0907 21:06:30.556189 139893622470464 dataset_builder.py:399] Constructing tf.data.Dataset for split test, from /usr/local/google/tmp/trax/mnist/1.0.0
2019-09-07 21:06:30.579470: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference_Dataset_flat_map_read_one_file_154
2019-09-07 21:06:30.579543: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: __inference_Dataset_map_ExampleParser.parse_example_168
2019-09-07 21:06:30.635790: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_gif_true_218
2019-09-07 21:06:30.635885: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_gif_false_219
2019-09-07 21:06:30.641705: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_png_true_207
2019-09-07 21:06:30.641761: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_png_false_208
2019-09-07 21:06:30.641813: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_gif_true_218
2019-09-07 21:06:30.641878: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_gif_false_219
2019-09-07 21:06:30.648735: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: decode_image_cond_jpeg_true_188
2019-09-07 21:06:30.648824: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: decode_image_cond_jpeg_false_189
2019-09-07 21:06:30.648884: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_png_true_207
2019-09-07 21:06:30.648917: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_png_false_208
2019-09-07 21:06:30.648959: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_gif_true_218
2019-09-07 21:06:30.649018: W tensorflow/core/common_runtime/eager/context.cc:371] Added two functions with the same name: cond_gif_false_219
W0907 21:06:30.658030 139893622470464 deprecation_wrapper.py:119] From /usr/local/google/home/jlebar/code/tensor2tensor/tensor2tensor/trax/inputs.py:332: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
I0907 21:06:30.658161 139893622470464 inputs.py:333] Heuristically setting bucketing to False based on shapes of target tensors.
I0907 21:06:30.668358 139893622470464 inputs.py:333] Heuristically setting bucketing to False based on shapes of target tensors.
I0907 21:06:30.675841 139893622470464 inputs.py:333] Heuristically setting bucketing to False based on shapes of target tensors.
2019-09-07 21:06:30.679798: E external/org_tensorflow/tensorflow/stream_executor/stream.cc:332] Error recording event in stream: error recording CUDA event on stream 0x73329b0: CUDA_ERROR_INVALID_HANDLE: invalid resource handle; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2019-09-07 21:06:30.680048: E external/org_tensorflow/tensorflow/stream_executor/stream.cc:332] Error recording event in stream: error recording CUDA event on stream 0x73329b0: CUDA_ERROR_INVALID_HANDLE: invalid resource handle; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2019-09-07 21:06:30.680270: E external/org_tensorflow/tensorflow/stream_executor/stream.cc:332] Error recording event in stream: error recording CUDA event on stream 0x73329b0: CUDA_ERROR_INVALID_HANDLE: invalid resource handle; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2019-09-07 21:06:30.680468: E external/org_tensorflow/tensorflow/stream_executor/stream.cc:332] Error recording event in stream: error recording CUDA event on stream 0x73329b0: CUDA_ERROR_INVALID_HANDLE: invalid resource handle; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2019-09-07 21:06:30.680650: E external/org_tensorflow/tensorflow/stream_executor/stream.cc:332] Error recording event in stream: error recording CUDA event on stream 0x73329b0: CUDA_ERROR_INVALID_HANDLE: invalid resource handle; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2019-09-07 21:06:30.681040: E external/org_tensorflow/tensorflow/stream_executor/stream.cc:332] Error recording event in stream: error recording CUDA event on stream 0x73329b0: CUDA_ERROR_INVALID_HANDLE: invalid resource handle; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2019-09-07 21:06:30.681240: E external/org_tensorflow/tensorflow/stream_executor/stream.cc:332] Error recording event in stream: error recording CUDA event on stream 0x73329b0: CUDA_ERROR_INVALID_HANDLE: invalid resource handle; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2019-09-07 21:06:31.035813: E external/org_tensorflow/tensorflow/stream_executor/stream.cc:332] Error recording event in stream: error recording CUDA event on stream 0x73329b0: CUDA_ERROR_INVALID_HANDLE: invalid resource handle; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2019-09-07 21:06:31.201710: E external/org_tensorflow/tensorflow/compiler/xla/python/local_client.cc:731] Execution of replica 0 failed: Internal: failed to load in-memory CUBIN: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/google/home/jlebar/code/tensor2tensor/tensor2tensor/trax/trainer.py", line 133, in <module>
app.run(main)
File "/usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/usr/local/google/home/jlebar/code/tensor2tensor/tensor2tensor/trax/trainer.py", line 129, in main
trax.train(output_dir=output_dir)
File "/usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/gin/config.py", line 1073, in gin_wrapper
utils.augment_exception_message_and_reraise(e, err_str)
File "/usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
six.raise_from(proxy.with_traceback(exception.__traceback__), None)
File "<string>", line 3, in raise_from
File "/usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/gin/config.py", line 1050, in gin_wrapper
return fn(*new_args, **new_kwargs)
File "/usr/local/google/home/jlebar/code/tensor2tensor/tensor2tensor/trax/trax.py", line 878, in train
save_steps=save_steps, has_weights=has_weights)
File "/usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/gin/config.py", line 1073, in gin_wrapper
utils.augment_exception_message_and_reraise(e, err_str)
File "/usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
six.raise_from(proxy.with_traceback(exception.__traceback__), None)
File "<string>", line 3, in raise_from
File "/usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/gin/config.py", line 1050, in gin_wrapper
return fn(*new_args, **new_kwargs)
File "/usr/local/google/home/jlebar/code/tensor2tensor/tensor2tensor/trax/trax.py", line 551, in __init__
rng, init_rng = jax_random.split(rng)
File "/usr/local/google/home/jlebar/code/tensor2tensor/tensor2tensor/trax/backend.py", line 256, in split
return backend()["random_split"](prng, num)
File "/usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/jax/random.py", line 193, in split
return _split(key, num)
File "/usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/jax/api.py", line 147, in f_jitted
out = xla.xla_call(flat_fun, *args_flat, device_assignment=device_assignment, backend=backend)
File "/usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/jax/core.py", line 569, in call_bind
outs = primitive.impl(f, *args, **params)
File "/usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/jax/interpreters/xla.py", line 368, in _xla_call_impl
return compiled_fun(*args)
File "/usr/local/google/home/jlebar/.local/lib/python3.6/site-packages/jax/interpreters/xla.py", line 402, in _execute_compiled
out_bufs = compiled.Execute(input_bufs).destructure()
RuntimeError: Internal: failed to load in-memory CUBIN: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid
In call to configurable 'Trainer' (<class 'tensor2tensor.trax.trax.Trainer'>)
In call to configurable 'train' (<function train at 0x7f3b2c3fa8c8>)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment