running an R script on a gpu (ubunto)

shamimashrafiyan · September 12, 2022, 11:27am

I want to run my R script on a GPU server. My script trains a regression model by Keras. to speed up the process, I used parallelization. Then I have done the below steps and then run my model:
First I installed a miniconda,
*Creating a Conda environment
*activated the conda invironment
*Installing PyTorch, pip, and tensorflow by (conda install tensorflow-gpu):
**conda install cudatoolkit=11.2 -c conda-forge
**pip install torch
**conda install tensorflow-gpu
**conda install cudatoolkit=11.1 pytorch=1.8.1 python=3.8 torchvision -c pytorch -c conda-forge
**pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cu111.html

Then I ran my script.
The problem is, that GPU is not active, because of Tensorflow. So It is like a CPU and the running time is too long. These are the warning over running the script:
As far as I know I coudnt install tensorflow-gpu which is related to R!

#####---
2022-09-12 13:13:30.434379: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/R/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/default-java/lib/server:/usr/lib/R/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/default-java/lib/server

2022-09-12 13:13:30.434395: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at GPU 支援 | TensorFlow for how to download and setup the required libraries for your platform.

Skipping registering GPU devices...

2022-09-12 13:13:30.434750: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
###-------

I need to mention the script did run but I couldn't improve the running time because I couldn't use the GPU because of the above warnings.

Is there anybody who knows how to fix the problem?
Should I add my code here?

nirgrahamuk · September 12, 2022, 11:47am

I think this is a clue. I recommend you google to read about how to install libcudnn on the system

michaelmayer · September 12, 2022, 11:54am

This library is missing - you should be able to add it to your conda env via conda install cudnn (cf. Cudnn :: Anaconda.org)

shamimashrafiyan · September 12, 2022, 3:46pm

Thank you, I installed that one but I got those errors again. I need to mention it showed me some line that this package is in conflict with other packages, but finally, showed the version of the installed package, please see the

shamimashrafiyan · September 13, 2022, 1:24pm

update:
I fixed the previous error but I got this one:
failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED

tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(4, 440), b.shape=(440, 221), m=4, n=221, k=440 [[node sequential/dense_1/MatMul (defined at /keras/engine/training.py:1100) ]] [Op:__inference_train_function_594]
Do you have any ideas?

michaelmayer · September 19, 2022, 10:36am

github.com/tensorflow/tensorflow

failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED

opened 03:33PM - 27 Apr 17 UTC

closed 02:08AM - 07 May 20 UTC

traderbagel

stat:awaiting response type:build/install stale TF 2.1

### System information - **OS Platform and Distribution (e.g., Linux Ubuntu 16.…04)**: ``` Linux 4.4.0-75-generic #96-Ubuntu SMP Thu Apr 20 09:56:33 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux ``` - **TensorFlow version (use command below)**: ``` I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally v1.0.0-65-g4763edf-dirty 1.0.1 ``` - **CUDA/cuDNN version**: 8.0 - **GPU model and memory**: ``` name: GeForce GTX 980 major: 5 minor: 2 memoryClockRate (GHz) 1.2785 pciBusID 0000:01:00.0 Total memory: 3.94GiB Free memory: 145.50MiB ``` ### Describe the problem If I change the order of device usage, it would report error ### Source code / logs - If I use GPU first then CPU, it would be fine ``` with tf.device('/gpu:0'): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) with tf.device('/cpu:0'): e = tf.constant([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18], shape=[2, 9],dtype=tf.float32, name='e') f = tf.matmul(c,e) sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) print(sess.run(f)) ``` - But if I use CPU first then GPU, it return error ``` with tf.device('/cpu:0'): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) with tf.device('/gpu:0'): e = tf.constant([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18], shape=[2, 9],dtype=tf.float32, name='e') f = tf.matmul(c,e) sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) print(sess.run(f)) ``` - the error dump below ``` print(sess.run(f)) File "/home/pika/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run run_metadata_ptr) File "/home/pika/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _run feed_dict_string, options, run_metadata) File "/home/pika/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run target_list, options, run_metadata) File "/home/pika/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(2, 2), b.shape=(2, 9), m=2, n=9, k=2 [[Node: MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](MatMul/_1, e)]] Caused by op 'MatMul_1', defined at: File "m1_n0teb00k/tensorflow_palyground.py", line 13, in <module> f = tf.matmul(c,e) File "/home/pika/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 1765, in matmul a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) File "/home/pika/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1454, in _mat_mul transpose_b=transpose_b, name=name) File "/home/pika/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op op_def=op_def) File "/home/pika/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/pika/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__ self._traceback = _extract_stack() InternalError (see above for traceback): Blas SGEMM launch failed : a.shape=(2, 2), b.shape=(2, 9), m=2, n=9, k=2 [[Node: MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](MatMul/_1, e)]] ```

system · October 31, 2022, 10:36am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.