You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

ImageRecognition.md 7.1 kB

6 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
  1. # Chapter. Image Recognition
  2. An example for using the [TensorFlow.NET](https://github.com/SciSharp/TensorFlow.NET) and [NumSharp](https://github.com/SciSharp/NumSharp) for image recognition, it will use a pre-trained inception model to predict a image which outputs the categories sorted by probability. The original paper is [here](https://arxiv.org/pdf/1512.00567.pdf). The Inception architecture of GoogLeNet was designed to perform well even under strict constraints on memory and computational budget. The computational cost of Inception is also much lower than other performing successors. This has made it feasible to utilize Inception networks in big-data scenarios, where huge amount of data needed to be processed at reasonable cost or scenarios where memory or computational capacity is inherently limited, for example in mobile vision settings.
  3. The GoogLeNet architecture conforms to below design principles:
  4. * Avoid representational bottlenecks, especially early in the network.
  5. * Higher dimensional representations are easier to process locally within a network.
  6. * Spatial aggregation can be done over lower dimensional embeddings without much or any loss in representational power.
  7. * Balance the width and depth of the network.
  8. #### Let's get started with real code.
  9. ##### 1. Prepare data
  10. This example will download the dataset and uncompress it automatically. Some external paths are omitted, please refer to the source code for the real path.
  11. ```csharp
  12. private void PrepareData()
  13. {
  14. Directory.CreateDirectory(dir);
  15. // get model file
  16. string url = "models/inception_v3_2016_08_28_frozen.pb.tar.gz";
  17. string zipFile = Path.Join(dir, $"{pbFile}.tar.gz");
  18. Utility.Web.Download(url, zipFile);
  19. Utility.Compress.ExtractTGZ(zipFile, dir);
  20. // download sample picture
  21. string pic = "grace_hopper.jpg";
  22. Utility.Web.Download($"data/{pic}", Path.Join(dir, pic));
  23. }
  24. ```
  25. ##### 2. Load image file and normalize
  26. We need to load a sample image to test our pre-trained inception model. Convert it into tensor and normalized the input image. The pre-trained model takes input in the form of a 4-dimensional tensor with shape [BATCH_SIZE, INPUT_HEIGHT, INPUT_WEIGHT, 3] where:
  27. - BATCH_SIZE allows for inference of multiple images in one pass through the graph
  28. - INPUT_HEIGHT is the height of the images on which the model was trained
  29. - INPUT_WEIGHT is the width of the images on which the model was trained
  30. - 3 is the (R, G, B) values of the pixel colors represented as a float.
  31. ```csharp
  32. private NDArray ReadTensorFromImageFile(string file_name,
  33. int input_height = 299,
  34. int input_width = 299,
  35. int input_mean = 0,
  36. int input_std = 255)
  37. {
  38. return with<Graph, NDArray>(tf.Graph().as_default(), graph =>
  39. {
  40. var file_reader = tf.read_file(file_name, "file_reader");
  41. var image_reader = tf.image.decode_jpeg(file_reader, channels: 3, name: "jpeg_reader");
  42. var caster = tf.cast(image_reader, tf.float32);
  43. var dims_expander = tf.expand_dims(caster, 0);
  44. var resize = tf.constant(new int[] { input_height, input_width });
  45. var bilinear = tf.image.resize_bilinear(dims_expander, resize);
  46. var sub = tf.subtract(bilinear, new float[] { input_mean });
  47. var normalized = tf.divide(sub, new float[] { input_std });
  48. return with<Session, NDArray>(tf.Session(graph), sess => sess.run(normalized));
  49. });
  50. }
  51. ```
  52. ##### 3. Load pre-trained model and predict
  53. Load the pre-trained inception model which is saved as Google's protobuf file format. Construct a new graph then set input and output operations in a new session. After run the session, you will get a numpy-like ndarray which is provided by NumSharp. With NumSharp, you can easily perform various operations on multiple dimensional arrays in the .NET environment.
  54. ```csharp
  55. public void Run()
  56. {
  57. PrepareData();
  58. var labels = File.ReadAllLines(Path.Join(dir, labelFile));
  59. var nd = ReadTensorFromImageFile(Path.Join(dir, picFile),
  60. input_height: input_height,
  61. input_width: input_width,
  62. input_mean: input_mean,
  63. input_std: input_std);
  64. var graph = Graph.ImportFromPB(Path.Join(dir, pbFile));
  65. var input_operation = graph.get_operation_by_name(input_name);
  66. var output_operation = graph.get_operation_by_name(output_name);
  67. var results = with<Session, NDArray>(tf.Session(graph),
  68. sess => sess.run(output_operation.outputs[0],
  69. new FeedItem(input_operation.outputs[0], nd)));
  70. results = np.squeeze(results);
  71. var argsort = results.argsort<float>();
  72. var top_k = argsort.Data<float>()
  73. .Skip(results.size - 5)
  74. .Reverse()
  75. .ToArray();
  76. foreach (float idx in top_k)
  77. Console.WriteLine($"{picFile}: {idx} {labels[(int)idx]}, {results[(int)idx]}");
  78. }
  79. ```
  80. ##### 4. Print the result
  81. The best probability is `military uniform` which is 0.8343058. It's the correct classification.
  82. ```powershell
  83. 2/18/2019 3:56:18 AM Starting InceptionArchGoogLeNet
  84. label_image_data\inception_v3_2016_08_28_frozen.pb.tar.gz already exists.
  85. label_image_data\grace_hopper.jpg already exists.
  86. 2019-02-19 21:56:18.684463: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
  87. create_op: Const 'file_reader/filename', inputs: empty, control_inputs: empty, outputs: file_reader/filename:0
  88. create_op: ReadFile 'file_reader', inputs: file_reader/filename:0, control_inputs: empty, outputs: file_reader:0
  89. create_op: DecodeJpeg 'jpeg_reader', inputs: file_reader:0, control_inputs: empty, outputs: jpeg_reader:0
  90. create_op: Cast 'Cast/Cast', inputs: jpeg_reader:0, control_inputs: empty, outputs: Cast/Cast:0
  91. create_op: Const 'ExpandDims/dim', inputs: empty, control_inputs: empty, outputs: ExpandDims/dim:0
  92. create_op: ExpandDims 'ExpandDims', inputs: Cast/Cast:0, ExpandDims/dim:0, control_inputs: empty, outputs: ExpandDims:0
  93. create_op: Const 'Const', inputs: empty, control_inputs: empty, outputs: Const:0
  94. create_op: ResizeBilinear 'ResizeBilinear', inputs: ExpandDims:0, Const:0, control_inputs: empty, outputs: ResizeBilinear:0
  95. create_op: Const 'y', inputs: empty, control_inputs: empty, outputs: y:0
  96. create_op: Sub 'Sub', inputs: ResizeBilinear:0, y:0, control_inputs: empty, outputs: Sub:0
  97. create_op: Const 'y_1', inputs: empty, control_inputs: empty, outputs: y_1:0
  98. create_op: RealDiv 'truediv', inputs: Sub:0, y_1:0, control_inputs: empty, outputs: truediv:0
  99. grace_hopper.jpg: 653 military uniform, 0.8343058
  100. grace_hopper.jpg: 668 mortarboard, 0.02186947
  101. grace_hopper.jpg: 401 academic gown, 0.01035806
  102. grace_hopper.jpg: 716 pickelhaube, 0.008008132
  103. grace_hopper.jpg: 466 bulletproof vest, 0.005350832
  104. 2/18/2019 3:56:25 AM Completed InceptionArchGoogLeNet
  105. ```
  106. You can find the full source code from [github](https://github.com/SciSharp/TensorFlow.NET-Examples/tree/master/src/TensorFlowNET.Examples/ImageProcessing).