White Papers

21 CheXNet – Inference with Nvidia T4 on Dell EMC PowerEdge R7425

output_names = [node.split(":")[0] for node in outputs]

graph = tf.Graph()

with tf.Session(graph=graph) as sess:

tf.saved_model.loader.load(

sess, meta_graph.meta_info_def.tags, savedmodel_dir)

frozen_graph_def = tf.graph_util.convert_variables_to_constants(

sess, graph.as_graph_def(), output_names)

write_graph_to_file(_GRAPH_FILE, frozen_graph_def, output_dir)

return frozen_graph_def

Freezing a model means pulling the values for all the variables from the latest model file, and then

replace each variable op with a constant that has the numerical data for the weights stored in its

attributes. It then strips away all the extraneous nodes that aren't used for forward inference, and saves

out the resulting GraphDef into a just single output file, which is easily deployable for production[14].

Load the frozen graph file from disk:

def get_frozen_graph(graph_file):

with tf.gfile.FastGFile(graph_file, "rb") as f:

graph_def = tf.GraphDef()

graph_def.ParseFromString(f.read())

Create and save GraphDef for the TensorRT™ inference using TensorRT™ library:

def get_trt_graph(graph_name, graph_def, precision_mode, output_dir,

output_node, batch_size=128, workspace_size=2<<10):

trt_graph = trt.create_inference_graph(

input_graph_def=graph_def,

outputs=[output_node],