
问题描述 投票:0回答:1

我有一台拥有3x 1080 GPU的机器。以下是培训代码:

    dynamic_learning_rate = tf.placeholder(tf.float32, shape=[])
    model_version = tf.constant(1, tf.int32)

    with tf.device('/cpu:0'):
        with tf.name_scope('Input'):
            # Input images and labels.
                batch_labels = self.get_batch()

    grads = []
    pred = []
    cost = []

    # Define optimizer
    optimizer = tf.train.MomentumOptimizer(learning_rate=dynamic_learning_rate / self.batch_size,

    split_input_image = tf.split(batch_images, self.num_gpus)
    split_input_vector = tf.split(batch_input_vectors, self.num_gpus)
    split_input_one_hot_label = tf.split(batch_one_hot_labels, self.num_gpus)
    for i in range(self.num_gpus):
        with tf.device(tf.DeviceSpec(device_type="GPU", device_index=i)):
            with tf.variable_scope(tf.get_variable_scope(), reuse=i > 0):
                with tf.name_scope('Model'):
                    # Construct model
                    with tf.variable_scope("inference"):
                        tower_pred = self.model(split_input_image[i], split_input_vector[i], is_training=True)


                with tf.name_scope('Loss'):
                    # Define loss and optimizer
                    softmax_cross_entropy_cost = tf.reduce_mean(
                        tf.nn.softmax_cross_entropy_with_logits(logits=tower_pred, labels=split_input_one_hot_label[i]))


    # Concat variables
    pred = tf.concat(pred, 0)
    cost = tf.reduce_mean(cost)

    # L2 regularization
    trainable_vars = tf.trainable_variables()
    l2_regularization = tf.add_n(
        [tf.nn.l2_loss(v) for v in trainable_vars if any(x in v.name for x in ['weights', 'biases'])])
    for v in trainable_vars:
        if any(x in v.name for x in ['weights', 'biases']):
            print(v.name + ' - included for L2 regularization!')

    cost = cost + self.l2_regularization_strength*l2_regularization

    with tf.name_scope('Accuracy'):
        # Evaluate model
        correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(batch_one_hot_labels, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
        prediction = tf.nn.softmax(pred, name='softmax')

    # Creates a variable to hold the global_step.
    global_step = tf.Variable(0, trainable=False, name='global_step')

    # Minimization
    update = optimizer.minimize(cost, global_step=global_step, colocate_gradients_with_ops=True)


Fri Nov 10 12:28:00 2017
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 1080    Off  | 00000000:03:00.0 Off |                  N/A |
| 42%   65C    P2    62W / 198W |   7993MiB /  8114MiB |    100%      Default |
|   1  GeForce GTX 1080    Off  | 00000000:04:00.0 Off |                  N/A |
| 33%   53C    P2   150W / 198W |   7886MiB /  8114MiB |      0%      Default |
|   2  GeForce GTX 1080    Off  | 00000000:05:00.0  On |                  N/A |
| 26%   54C    P2   170W / 198W |   7883MiB /  8108MiB |      0%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0     23228      C   python                                      7982MiB |
|    1     23228      C   python                                      7875MiB |
|    2      4793      G   /usr/lib/xorg/Xorg                            40MiB |
|    2     23228      C   python                                      7831MiB |

Fri Nov 10 12:28:36 2017
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 1080    Off  | 00000000:03:00.0 Off |                  N/A |
| 42%   59C    P2    54W / 198W |   7993MiB /  8114MiB |      0%      Default |
|   1  GeForce GTX 1080    Off  | 00000000:04:00.0 Off |                  N/A |
| 33%   57C    P2   154W / 198W |   7886MiB /  8114MiB |    100%      Default |
|   2  GeForce GTX 1080    Off  | 00000000:05:00.0  On |                  N/A |
| 27%   55C    P2   155W / 198W |   7883MiB /  8108MiB |    100%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0     23228      C   python                                      7982MiB |
|    1     23228      C   python                                      7875MiB |
|    2      4793      G   /usr/lib/xorg/Xorg                            40MiB |
|    2     23228      C   python                                      7831MiB |


对于单个GPU,训练速度在650 [images/second]附近, 所有3个GPU我只得到1050 [images/second]


parallel-processing deep-learning tensorflow-gpu multi-gpu parallelism-amdahl


© www.soinside.com 2019 - 2024. All rights reserved.