stringstream使用

寫一個parser簡單的parse input,可以利用stringstream簡單做tokenizer

std::string expression = "(2+3)*4";
std::stringstream ss(expression);
while(ss)
{
  ....
}

上面這個寫法看起來沒問題,檢查stringstream的state,如果fail就跳出迴圈,但其實這個寫法並不完全安全

我們再來看另一個範例

#include <sstream>
#include <iostream>

int main()
{
  std::string s = "123456";
  std::stringstream ss(s);
  while(ss)
  {
    char c = ss.peek();
    std::cout << c << ", " << (int)c << " , EOF = " << ss.eof() << std::endl;
    ss.get();
  }
  return 0;
}

一般可能會預期就是印出1, 2, 3, 4, 5, 6的ascii int,但其實還多了一個-1 ,也就是說,在讀完6之後ss 的state還不是eof,等到做了peek()操作後,eof bit就會set了

因此在處理stringstream讀取時,此部分要特別小心,不能假設ss valid代表後面的讀操作就 會正確,還需要在讀操作後做一些檢查

以peek來說,C++11標準中描述的行為

int_type peek();

Effects: Behaves as an unformatted input function (as described in 27.7.2.3, paragraph 1). After constructing a sentry object, reads but does not extract the current input character.
Returns: traits::eof() if good() is false. Otherwise, returns rdbuf()->sgetc().

C++11 27.7.2.3

而sgetc()的行為則是在C++11 27.7.2.1裡有描述

If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_- base::failure (27.5.5.4), before returning.

C++11 27.7.2.1

亦即 peek()本身會透過 sgetc()觸發eof bit set

網路上有一篇討論也值得參考,不過需注意的是該篇時間比較久,所以有些資訊的描述不一定跟上較新的標準

https://comp.lang.cpp.moderated.narkive.com/vwstw4Un/std-stringstream-and-eof-strangeness

This entry was posted in C++ Language. Bookmark the permalink.

Leave a Reply