Низкая скорость работы python tensorflow на GPU - что я делаю не так?

У меня есть двумерные массивы с целыми числами. Для ускорения работы программы я хотел использовать GPU для вычислений над ними. Однако, скорость на GPU оказывается такой же или чуть-чуть больше чем на CPU. Я понимаю, что перемещение тензоров в память GPU может занимать время и существенно замедлить выполнение программы, но, кажется дело не в этом. Я новичок в tensorflow, поэтому, скорее всего, просто что-то не понимаю и делаю не так. Подскажите, почему я не получаю выигрыша в скорости от использования GPU?

Использую: tensorflow 2.10.1, CUDA 11.2, cuDNN 8.1.0

Я написал тестовый код, который, по задумке, должен поместить 3 тензора в память ГПУ, а затем выполнить некоторые операции над ними. ГПУ выполняет вычисления лишь на 10-20% быстрее. Вот код, который я использовал для тестов:

import tensorflow as tf
import time
import random
from tensorflow.python.client import device_lib
def get_available_devices():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos]
print(get_available_devices())
tf.debugging.set_log_device_placement(True)
N1=5000
with tf.device('/GPU:0'):
    Test1=tf.constant(list([[int(random.random()*255) for j in range(N1)] for i in range(N1)]), tf.int32)
    Test2=tf.constant(list([[int(random.random()*255) for j in range(N1)] for i in range(N1)]), tf.int32)
    Test3=tf.constant(list([[int(random.random()*255) for j in range(N1)] for i in range(N1)]), tf.int32)
def function(Test1, Test2, Test3):
    A=tf.reduce_sum(tf.math.multiply(tf.abs(tf.subtract(Test1,Test2)),Test3), [0, 1])
    return A

graph_function = tf.function(function)

T=time.time()
with tf.device('/CPU:0'):
    for i in range(10):
        graph_function(Test1, Test2, Test3)
print(time.time()-T)
T=time.time()
with tf.device('/GPU:0'):
    for i in range(10):
        graph_function(Test1, Test2, Test3)
print(time.time()-T)

Результат выполнения:

2023-03-25 18:07:35.016280: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-25 18:07:35.551527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /device:GPU:0 with 5987 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2070 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 7.5
['/device:CPU:0', '/device:GPU:0']
2023-03-25 18:07:35.553468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5987 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2070 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 7.5
input: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:41.112701: I tensorflow/core/common_runtime/placer.cc:114] input: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:41.112993: I tensorflow/core/common_runtime/placer.cc:114] _EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:41.113289: I tensorflow/core/common_runtime/placer.cc:114] output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:41.119012: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:46.248527: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:51.544318: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
test1: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
test2: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
test3: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
Sub: (Sub): /job:localhost/replica:0/task:0/device:CPU:0
Abs: (Abs): /job:localhost/replica:0/task:0/device:CPU:0
Mul: (Mul): /job:localhost/replica:0/task:0/device:CPU:0
Sum: (Sum): /job:localhost/replica:0/task:0/device:CPU:0
Identity: (Identity): /job:localhost/replica:0/task:0/device:CPU:0
identity_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:CPU:0
Sum/reduction_indices: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.708549: I tensorflow/core/common_runtime/placer.cc:114] test1: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.708823: I tensorflow/core/common_runtime/placer.cc:114] test2: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.709093: I tensorflow/core/common_runtime/placer.cc:114] test3: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.709370: I tensorflow/core/common_runtime/placer.cc:114] Sub: (Sub): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.709632: I tensorflow/core/common_runtime/placer.cc:114] Abs: (Abs): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.709901: I tensorflow/core/common_runtime/placer.cc:114] Mul: (Mul): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.710157: I tensorflow/core/common_runtime/placer.cc:114] Sum: (Sum): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.710428: I tensorflow/core/common_runtime/placer.cc:114] Identity: (Identity): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.710723: I tensorflow/core/common_runtime/placer.cc:114] identity_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.711030: I tensorflow/core/common_runtime/placer.cc:114] Sum/reduction_indices: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.717553: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.759928: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.803280: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.861516: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.906483: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.952353: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:51.998651: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:52.043548: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:52.089060: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:52.135357: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:CPU:0
0.5734655857086182
test1: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
test2: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
test3: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
Sub: (Sub): /job:localhost/replica:0/task:0/device:GPU:0
Abs: (Abs): /job:localhost/replica:0/task:0/device:GPU:0
Mul: (Mul): /job:localhost/replica:0/task:0/device:GPU:0
Sum: (Sum): /job:localhost/replica:0/task:0/device:GPU:0
Identity: (Identity): /job:localhost/replica:0/task:0/device:GPU:0
identity_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
Sum/reduction_indices: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.182973: I tensorflow/core/common_runtime/placer.cc:114] test1: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:52.183241: I tensorflow/core/common_runtime/placer.cc:114] test2: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:52.183500: I tensorflow/core/common_runtime/placer.cc:114] test3: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
2023-03-25 18:07:52.183758: I tensorflow/core/common_runtime/placer.cc:114] Sub: (Sub): /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.184010: I tensorflow/core/common_runtime/placer.cc:114] Abs: (Abs): /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.184259: I tensorflow/core/common_runtime/placer.cc:114] Mul: (Mul): /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.184517: I tensorflow/core/common_runtime/placer.cc:114] Sum: (Sum): /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.184793: I tensorflow/core/common_runtime/placer.cc:114] Identity: (Identity): /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.185097: I tensorflow/core/common_runtime/placer.cc:114] identity_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.185411: I tensorflow/core/common_runtime/placer.cc:114] Sum/reduction_indices: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.190788: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.241576: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.290486: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.338611: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.384468: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.433736: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.485451: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.531441: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.577985: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:GPU:0
2023-03-25 18:07:52.623322: I tensorflow/core/common_runtime/eager/execute.cc:1423] Executing op __inference_function_15 in device /job:localhost/replica:0/task:0/device:GPU:0
0.4886949062347412

Ответы (0 шт):