RでTensorFlow - 統計コンサルの議事メモ

知らない間にRでTensorFlowが使えるようになっていたので触ってみました。それにしてもRStudioは相変わらずイイ仕事をしますね。

まずは以下の通り、RStudioのGitHubからTensorFlowのライブラリをインストールします。なおこのライブラリはあくまでPCに事前にインストールされたTensorFlowをRから呼び出すためのものなので、TensorFlow自体は別途インストールする必要があります。

## TensorFlowのライブラリをインストールする
devtools::install_github("rstudio/tensorflow")

## ライブラリの読み込み
## データ加工用にdplyrとcaret、分析結果の比較用にglmnetとrandom forestのライブラリも読み込む
library(tensorflow)
library(dplyr)
library(caret)
library(glmnet)
library(ranger)

無事にインストールが完了したら、一旦TensorFlowのバージョンを確認しておきましょう。ちなみに私の環境は0.11です。

## TensorFlowのバージョン確認
tf$VERSION
[1] "0.11.0rc0"

では早速、TensorFlowによる「Hello,World!」です。この辺りは普通にマニュアルにある通りです。

## hello, tensorflow
sess  <- tf$Session()
hello <- tf$constant("Hello, TensorFlow!")
sess$run(hello)
[1] "Hello, TensorFlow!"

単回帰の実行

続いてTensorFlowによる単回帰を実行してみます。この書き方を理解できれば重回帰への拡張は容易でしょう。

## seedを固定しておく
set.seed(123)

## データ作成。回帰のパラメータは切片が-12、傾きが1.5（適当です）
x_data <- runif(100, min=0, max=1)
y_data <- x_data * 1.5 - 12

## TensorFlowによる変数の宣言
W <- tf$Variable(tf$random_uniform(shape(1L), -1.0, 1.0))
b <- tf$Variable(tf$zeros(shape(1L)))
y <- W * x_data + b

## 損失関数の定義と最適化のパラメータ設定
loss  <- tf$reduce_mean((y-y_data)^2)
opt   <- tf$train$GradientDescentOptimizer(0.2)
train <- opt$minimize(loss)

上記で分析環境が整ったので以下の通り実行しますが、その際の注意点として、TensorFlowのバージョンによって初期化に用いる関数が異なることに気をつけましょう。私の環境（0.11）では初期化の関数は「initialize_all_variables()」なのですが、新しいバージョン（0.12）では「global_variables_initializer()」が用いられるようです。

sess <- tf$Session()

## 変数の初期化。TensorFlowのバージョンによって、初期化に用いる関数が異なる。
# sess$run(tf$global_variables_initializer())
sess$run(tf$initialize_all_variables())

## 学習の実行
for (step in 1:1000){
  sess$run(train)
  if (step %% 20 == 0) {
    cat(step, "-", sess$run(W), sess$run(b), "\n")
  }
}
20 - -2.501043 -9.869881 
40 - -0.8908119 -10.72716 
60 - 0.0713779 -11.23942 
80 - 0.646331 -11.54552 
100 - 0.9898927 -11.72843
：
920 - 1.499992 -12 
940 - 1.499992 -12 
960 - 1.499992 -12 
980 - 1.499992 -12 
1000 - 1.499992 -12

LearningRateは小さめに設定してありますが、ちゃんと収束しているようですね。

多項分類

続けてIrisのデータを用いてSpeciesの分類を行います。

## Irisデータの多項分類
W  <- tf$Variable(tf$zeros(shape(4L, 3L)))
x  <- tf$placeholder(tf$float32, shape(NULL, 4L))
b  <- tf$Variable(tf$zeros(3L))
y  <- tf$nn$softmax(tf$matmul(x, W)+b)
y_ <- tf$placeholder(tf$float32, shape(NULL, 3L))

## 損失関数の定義
loss  <- tf$reduce_mean(-tf$reduce_sum(y_*y, reduction_indices = 1L))
opt   <- tf$train$GradientDescentOptimizer(0.9)
train <- opt$minimize(loss)

## 分析用データ（Iris）
trn_x <- iris %>% 
  select(-Species) %>% 
  as.matrix()
trn_y <- iris %>% 
  dummyVars(formula = ~ Species, sep = NULL) %>% 
  predict(object = ., newdata = iris) %>% 
  as.matrix()

sess <- tf$Session()
sess$run(tf$initialize_all_variables())

## 学習の実行
N    <- 20000
each <- 1000
resAll <- matrix(NA, N, 15)
for (i in 1:N){
  sess$run(train, dict(x=trn_x, y_=trn_y))
  resAll[i, ] <- c(sess$run(W), sess$run(b))
  if (i %% each == 0) {
    cat(i, "-", sess$run(W), sess$run(b), "\n")
    print(table(
      predict = sess$run(fetches = y, feed_dict = dict(x = trn_x, y_ = trn_y)) %>% apply(margen=1, FUN=which.max),
      true = trn_y %>% apply(margen=1, FUN=which.max)))
  }
}
1000 - 1.820591 3.826579 -5.199587 -2.53192 1.656648 0.04738447 -0.9168726 -2.692287 -3.47724 -3.873963 6.116461 5.224204 0.8330443 1.588295 -2.421341 
       true
predict  1  2  3
      1 50  0  0
      2  0 46  0
      3  0  4 50
2000 - 2.170959 4.40071 -6.013396 -2.967016 1.705534 0.348204 -1.100592 -3.198786 -3.876493 -4.748915 7.113993 6.165799 0.9800377 2.52502 -3.505058 
       true
predict  1  2  3
      1 50  0  0
      2  0 47  0
      3  0  3 50
：
19000 - 4.02061 6.796506 -9.696184 -4.990173 2.587524 0.9785848 -2.744421 -5.431025 -6.608002 -7.775012 12.4406 10.42116 1.755378 8.18106 -9.936396 
       true
predict  1  2  3
      1 50  0  0
      2  0 47  0
      3  0  3 50
20000 - 4.081785 6.860996 -9.807202 -5.050861 2.61721 0.9981934 -2.800498 -5.49942 -6.698854 -7.859092 12.60772 10.55024 1.781228 8.357506 -10.13868 
       true
predict  1  2  3
      1 50  0  0
      2  0 47  0
      3  0  3 50

20,000回ほど回してみたのですが、意外と分類が難しいのかパラメータは収束していません。また分類精度も最初の方で現在のレベルまで到達して以降、全く改善はありませんでした。この辺りはLearningRateや初期値の設定などに依存するのかもしれません。

最後に、以上の結果を他の分類器と比較してみます。比較対象にはglmnetとrandom forestを用いました。

## glmnetによる結果と比較してみる
resGLM <- glmnet(trn_x, iris$Species, "multinomial")
table(predict = predict(resGLM, newx = trn_x, type = "class", s = 0.01)[, 1],　true = iris$Species)
            true
predict      setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         47         1
  virginica       0          3        49

## random forestによる結果と比較してみる
resRF <- ranger(Species ~., data=iris, mtry=2, num.trees = 500, write.forest=TRUE)
table(predict = predict(resRF, data=iris)$predictions, true=iris$Species)
            true
predict      setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         50         0
  virginica       0          0        50

random forestは流石の精度ですね。Irisは分類が容易なデータなので、TensorFlow、glmnetもそれなりの精度で分類できています。

雑感

以上、RによるTensorFlowの流れを追ってみたのですが、正直に言えば「コレジャナイ」感が強いなーというのが私の感想です。一番引っ掛かりを感じる部分が「事前にTensorFlowのインストールが必要なこと」で、Rをメインに使用しているユーザーからはPythonの環境整備に対するハードルが高そうだなと思いました。他のライブラリのようにTensorFlow自体もinstall.packages()でインストールできると良いな、と。

もう一つ、これはPython用のTensorFlowをRから呼び出しているためなのでしょうが、書き方がRっぽくないなーと感じました。特に「tf$Session()」の部分などいかにもPythonっぽくてちょっと取っつきにくいなぁと思ってしまいます。

もちろんこれは私がそのように感じるというだけですので、Pythonにも馴染みのある方なら全く問題とはならないでしょう。ただ、私としてはmxnetの方が良さそうだなと思いましたので、もう少し検討を続けようと思います。