C++ Essay

nodejs quick install

Posted on November 23, 2019 by Enrico

nodejs 在linux安裝如果不透過安裝包或是yum的包管理來安裝，最快的方式是直接從官方網站抓binary dist，以下是在CentOS 6.9測試

https://nodejs.org/dist/latest-v10.x/
Linux: https://nodejs.org/dist/latest-v10.x/node-v10.17.0-linux-x64.tar.gz

tar xf node-v10.17.0-linux-x64.tar.gz

將 bin/node copy到 /usr/local/bin

cp node /usr/local/bin/

install npm 解壓縮的binary包有帶npm，用這個npm來抓取最新的npm安裝到系統

./npm -g install npm

如果只是拿nodejs來執行的話，npm並不需要裝，只需要node執行檔。在壓縮包裏有 include，那是需要build binary addon時會用到，可直接複製到 /usr/local/include

cp -r include/node/ /usr/local/include/

Posted in nodejs | Leave a comment

C語言的struct hack整理

Posted on November 18, 2019 by Enrico

在C語言中，有一個技巧叫struct hack，主要是要實現動態大小的struct

要紀錄的資訊有名字欄位，可以這樣設計

struct person_info
{
  int age;
  char name[32];
};

但是這邊假設name存的字串最長是31 + 1 bytes (null terminated string)，
如果要能動態長度就要改用指標處理，例如

struct person_info
{
  char gender;
  char* name;
};

struct person_info alice;
alice.gender = 'F';
alice.name = malloc(sizeof("alice"));
strcpy(alice.name, "alice");

這個的缺點就是 name的記憶體配置是另外配置，和person_info不在同一處，可能無法利用到cpu cache。

struct hack的作法是讓尾端可以動態大小，此時在struct最後面定義一個place holder

strcut person_info
{
  char gender; //'M', 'F'
  char name[0]; //zero length array
};

//sizeof(struct person_info) => 1

struct person_info* alice = malloc(sizeof(struct person_info) + sizeof("alice"));

malloc 會allocate 一塊空間包含struct member所需的，剩下的空間可以用name來存取

注意這裡name的用途是拿來做place holder，不用另外計算struct之後的起始位置，因為有時候有alignment padding的問題。

alice->gender = 'F';
strcpy(alice->name, "alice");

strcut person_info
{
  char gender; //'M', 'F' (佔據 0 byte位置)
               //第1,2,3 byte位置 padding
  int data[0]; //第4byte開始...
};

//sizeof(struct person_info) => 4
//data的alignment是在 4 byte boundary。

struct hack的作法只能用在最後一個member是不定大小。在C99 可以用flexible array member來描述 char name[];

flexible array member算是incomplete array type，所以不能直接sizeof

§6.2.5 #12 An array type of unknown size is an incomplete type.

§6.5.3.4 The sizeof operator
The sizeof operator shall not be applied to an expression that has function type or an incomplete type.

但是標準給出了一個例外

§6.7.2.1 第16點
As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. … the size of the structure shall be equal to the offset of the last element of an otherwise identical structure that replaces the flexible array member with an array of unspecified length

另外需注意的是，因為flexible array member是C99的標準，在那之前一般是用zero length array來做place holder，

zero length array事實上是gcc的extension。標準C不允許zero length array

§6.7.5.2 Array declarators
If the expression is a constant expression, it shall have a value greater than zero. The element type shall not be an incomplete or function type.

上面提到的const expression是指Array [N]的N，要求 > 0

參考:

https://wiki.sei.cmu.edu/confluence/display/c/DCL38-C.+Use+the+correct+syntax+when+declaring+a+flexible+array+member

https://www.geeksforgeeks.org/struct-hack/

Posted in C Language | Leave a comment

Diffie-Hellman Key Exchange整理

Posted on November 5, 2019 by Enrico

在雙方要做加密通信時，需要共同協商出密鑰，無論是加解密同一把，或是加密/解密分開，都需要某種方式讓雙方得到密鑰。最簡單的做法就是透過事先設定好密鑰，或是另外透過安全的傳輸傳輸密鑰。

但是密鑰要在安全傳輸的通道裡傳輸，安全傳輸的通道怎麼來的，例如可能是實體的額外的線路，也可能是兩方私下先共同約定好特定密鑰的加密通道，不過這兩種做法似乎都要先約定好，是否有辦法直接在不安全的通道裡在雙方都沒有事先溝通的參數的情況下協商出一把只有雙方才知道的密鑰呢?

這個問題等同於如果有一個人在偷聽兩個陌生人的加密對話，從頭聽到尾(包括兩個人加密協商階段也能偷聽到的情況下)，是否可以讓偷聽的人無法知道兩個陌生人原始的對話，也就是避免偷聽的人解開兩人通信使用的密鑰，答案是可以，這邊必須注意的是 – 偷聽是只在旁邊偷偷觀察，不是MTIM中間人攻擊

關於這個問題可參考 https://en.wikipedia.org/wiki/Key_exchange 比較多詳細的描述

Diffie-Hellman密鑰交換算法可以解決這個問題：

即使是在不安全的通道裡雙方還是可以協商出一同一把密鑰，而竊聽的人無法輕易得知那把密鑰。

他的做法是給定兩個數 p, g
p是質數、g是 primitive root modulo p
https://en.wikipedia.org/wiki/Primitive_root_modulo_n

Alice、Bob透過不安全的通道交換p, g的資訊(所以這個資訊可能被第三者知道)

Alice挑了一個數字a 計算出 A=g^a mod p，然後將A數字給Bob
Bob挑了一個數字b 計算出 B=g^b mod p，然後將B數字給Alice
Alice將拿到的B 計算出 K1=B^a mod p
Bob將拿到的A 計算出 K2=A^b mod p
事實上 K1 = K2 亦即他們已經協商出同一把密鑰了

第三者可能得到的資訊是 p, g, A, B，但是沒辦法算出K
而a則是只有Alice知道，b則是只有Bob知道

以下說明 K1=K2

g^a = M * p + A
A = g^a – M * p
A^b = (g^a – M * p)^b
K1 = A^b mod p = (g^a – M * p)^b mod p = g^a^b mod p (二項式展開後只有第一項不整除)

g^b = M’ * p + B
B = g^b – M’ * p
B^a = (g^b – M’ * p)^a
K2 = B^a mod p = (g^b – M’ * p)^a mod p = g^b^a mod p

K1 = K2 = g^b^a mod p

在實務上，a, b, p取較大的數字，因為K是由mod p決定，如果p太小，代表K 可能是 {0…p-1}的數，很容易被猜出來。

這裡只單純處理通道加密而沒有authenticate的概念，所以他沒辦法阻擋MITM(man-in-the-middle)，但是可以避免eavesdrop。應用常見於一開始的session連線加密，例如ssh在一開始做key exchange時就用到此算法的變形。

另外一個應用是手機通訊軟體的end-to-end加密，有些情況兩個client訊息是透過server加密傳輸(client-server而不是p2p)，雖然client server之間的連線是加密的，旁人無法竊聽，但是server可以看到plain text，透過D-H Key exchange，可以做到連server也不知道密鑰，從而只有兩個client之間能加解密訊息內容。所以整個系統就會有連線transport加密，application層訊息的加密。

而因為session key不是事先決定的，所以也可以用來達成forward secrecy，
forward secrecy主要是防止事後因為某種原因密鑰洩漏造成session內容可被解開。 (舉例來說，如果加密的key可由密碼得出或是用固定的私鑰加密，一旦在未來的某個時間點密鑰洩漏，如果當時的雙方通信內容被側錄，就有機會被解開，而達到forward secrecy的做法則是 session key用另一把獨立的key)

參考 https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange

https://www.digitalocean.com/community/tutorials/understanding-the-ssh-encryption-and-connection-process

https://security.stackexchange.com/questions/76894/how-does-ssh-use-both-rsa-and-diffie-hellman

Posted in General | Leave a comment

v8 build from source

Posted on October 24, 2019 by Enrico

注意python要2.7 否則在gm.py那一步會失敗

Linux

安裝build環境
https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/depot_tools_tutorial.html

git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git

export PATH=$PATH:~/depot_tools

mkdir v8
cd v8
fetch v8

注意在~/v8 下會有.gclient，所以工作目錄不要在home

solutions = [
{
“url”: “https://chromium.googlesource.com/v8/v8.git”,
“managed”: False,
“name”: “v8”,
“deps_file”: “DEPS”,
“custom_deps”: {},
},
]

gclient sync

cd v8
./build/install-build-deps.sh

下面這步是helper script，產生build files, compile, 甚至是test 整合在一步

./tools/dev/gm.py x64.release

Posted in nodejs | Leave a comment

detect elevated privilege execution in windows

Posted on October 23, 2019 by Enrico

var child_process = require('child_process');

child_process.exec('fsutil dirty query %systemdrive%', function(err, stdout){
    if(err){
        console.log('error', err);  //Not OK      
        return;
    }
    //OK, privileged
});

要偵測是否是 UAC Privilege escalation，直接檢查是否Administrator並不可行，而是透過執行特定command來確認目前的執行是否是elevated privilege
這種作法在nodejs中不必在整合C API

參考: https://stackoverflow.com/questions/4051883/batch-script-how-to-check-for-admin-rights

Posted in nodejs | Leave a comment

Keywords (annotated)

Posted on September 23, 2019 by Enrico

The above tokens (case sensitive) are reserved (in translation phases 7 and 8) for use as keywords, and shall not be used otherwise.

keywords出現在phase7開始，在之前是屬於preprocessor token的identifier

上面的紅框是C99新增的keyword，_Bool定義成_Bool而不是bool，主要的原因是在C89的標準中，對於reserved identifier是用底線+大寫字母開頭或是雙底線。提供了後續標準新增的關鍵字避免使用上與使用者衝突。另外在標準中提到 Implementation-defined keywords shall have the form of an identifier reserved for any use as described in 7.1.3. 也是同樣的理由

Posted in C Language | Leave a comment

Identifier (annotated)

Posted on September 23, 2019 by Enrico

參考: https://en.cppreference.com/w/c/language/identifier

有關C99 identifier的定義

6.4.2.1 An identifier is a sequence of nondigit characters (including the underscore _, the lowercase and uppercase Latin letters, and other characters) and digits.

C99也支援universal-character-name \u or \U開頭的UCN來作為identifier的字元(可以在identifier開頭) \u接4個hex \U接8個hex

當然也考慮了上面提到的不能digit開頭的限制

Each universal character name in an identifier shall designate a character whose encoding in ISO/IEC 10646 falls into one of the ranges specified in annex D.59) The initial character shall not be a universal character name designating a digit.

6.2.1 An identifier can denote an object; a function; a tag or a member of a structure, union, or enumeration; a typedef name; a label name; a macro name; or a macro parameter

macro name也是identifier，但在compile (phase 7)時已經沒有macro了。事實上這邊指的identifier同樣也用在preprocessing token。並且在preprocessing階段，identifier只分成macro和不是macro。可參考 §6.10.1 140)

Because the controlling constant expression is evaluated during translation phase 4, all identifiers either are or are not macro names — there simply are no keywords, enumeration constants, etc

6.4.2.1 When preprocessing tokens are converted to tokens during translation phase 7, if a preprocessing token could be converted to either a keyword or an identifier, it is converted to a keyword

這段主要是說在preprocessing token階段，沒有keyword概念(可參看上面的6.4 Lexical elements Syntax)，所以tokenize時標記的會是identifier，可是到phase 7時，會轉成token，這時候就會區分要算在token的keyword還是identifier

Posted in C Language | Leave a comment

C99 Terms, definitions, and symbols (annotated)

Posted on September 19, 2019 by Enrico

整理一下C99在Terms, definitions, and symbols提到的一些名詞定義的重點整理，總共有19個。

3.1 access

在標準中提到的access 即是對object存取，read or modify。這裡modify包含對object存入相同的值

‘Modify’ includes the case where the new value being stored is the same as the previous value

另外是如果Expressions沒有被evaluate就沒有access(例如short circuit evaluation)

Expressions that are not evaluated do not access objects.

3.2 alignment

alignment requirement that objects of a particular type be located on storage boundaries with addresses that are particular multiples of a byte address

這邊alignment指的是特定型態object的位址，例如word boundary(address divisible by 4 or 8)， alignment requirement這部分是跟機器架構有關，也跟指令有關。 x86大部分的指令data access資料的位址不一定要alignment，如果沒有alignment在boundary，存取效率會比較差，但例如i386的__float128就要求一定要對齊16 byte boundary

Posted in C Language | Leave a comment

Phases of translation (annotated)

Posted on September 19, 2019 by Enrico

在C99的translation phases共分8步。translation phases描述了從C source code到program image的處理流程。

參考整理 https://en.cppreference.com/w/c/language/translation_phases &ISO C99標準

The C source file is processed by the compiler as if the following phases take place, in this exact order. Actual implementation may combine these actions or process them differently as long as the behavior is the same.

這邊提到 phase of translation只要求compiler實現時表現的行為與這裡描述的步驟等價即可。

phase 1 主要是做character set轉換，和轉換斷行符<EOL>

Physical source file multibyte characters are mapped, in an implementation defined manner, ① to the source character set(introducing ② new-line characters for end-of-line indicators) if necessary. ③ Trigraph sequences are replaced by corresponding single-character internal representations.

其中提到source character set指的是如何解讀source file(encoding) ，source character set在標準裡5.2.1有描述:

Two sets of characters and their associated collating sequences shall be defined: the set in which source files are written (the source character set), and the set interpreted in the execution environment (the execution character set).

其他有關source character set在可參考以下連結有比較詳細的說明
https://stackoverflow.com/questions/27872517/what-are-the-different-character-sets-used-for
https://stackoverflow.com/questions/15558977/characters-defined-using-uxxxx-format-display-the-wrong-character

或是gcc的cpp(preprocessor) doc也有值得參考的解釋 (參考: https://gcc.gnu.org/onlinedocs/cpp/Character-sets.html#Character-sets )，但須注意gcc cpp doc和規範描述的略有出入，標準中提到的source character應該是對應gcc cpp 描述的 input-charset，而cpp doc這裡提到的source character set是指在這個階段將實際c的source file讀進來時，要用什麼字元集來做內部處理。

C99在這裡沒有特別規範compiler內部的internal representation。在C99 rationale(page 20)裡有提到一些C和C++裡對於這部分的差異
http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf

對比C++的phase 1是轉成basic source character&UCN:節錄 https://en.cppreference.com/w/cpp/language/translation_phases
Any source file character that cannot be mapped to a character in the basic source character set is replaced by its universal character name (escaped with \u or \U) or by some implementation-defined form that is handled equivalently.

phase 1除了source character set mapping外，另外也會將<EOL>換成<LF>，所以在後續的phase，會提到new-line character，是來自於此步驟。

phase 2 line splicing 去行尾的 \ 來做拼接

Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines 註: 這裡的new-line character就是從第一步<EOL>轉過來的

這裡規定解析拼接行時，只看 \ + <LF> 。那如果 \ + 空白 + <LF>會發生什麼事呢? 照理按照上述說明，不會被拼接。但是gcc的實現寬鬆了這樣的限制，只給出warning，說明可參考以下

The preprocessor treatment of escaped newlines is more relaxed than that specified by the C90 standard, which requires the newline to immediately follow a backslash.)

另外，在這個階段標準還要求 1. 檔案要以<LF>結尾 2. 最後不能是 \ + <LF>

A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character before any such splicing takes place

注意的是這邊標準裡用的是shall，代表如果violate屬於undefined behavior。在gcc裡只是給個warning

phase 3 tokenization(for preprocessing)

The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments)

在這個階段，將source file區分成空白(包括註解，註解被取代為一個)、和preprocessing tokens，有關preprocessing token在6.4有詳細說明。

A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6.

指的是compile(phase 7)前 preprocessor看到的token

Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined

其中“each” comment部分用一個<SPACE>取代。除了new-line character需要保留外，其他的空白(一個或多個)可以選擇保留或是用一個<SPACE>取代

phase 4 preprocessing

此階段執行所以的Preprocessing directive和macro expansion

這邊值得注意的是其中有關#include的描述

A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively

這邊提到#include 是recursive做處理，這也解釋了為什麼如果a.h和b.h交互include編譯器會報錯(如果沒有特別去用#pragma once或#ifdef擋掉的話。ps. #pragma once不在標準)。

phase 5 mapping to execution character set mapping

Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set;

這一步主要是轉換成execution character set，給compiler

參考: https://stackoverflow.com/questions/3768363/character-sets-not-clear

phase 6 string concatenation

Adjacent string literal tokens are concatenated
單純string literal連接，這一步已經是execution character set 。將”str1″ “str2” 連成 “str1str2”

phase 7 compile

Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.

這一步就是compile，compile成translation unit

phase 8 link

All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment

最後一步是link，最後產生成program image

Posted in C Language | Leave a comment

ASCII (annotated)

Posted on September 17, 2019 by Enrico

參考 https://en.cppreference.com/w/c/language/ascii 作一些補註

ASCII定義0x00-0x7F，超過的部分(第8bit)算是extended 8bit，看不同的標準有不同的定義，早期有些拿來當成parity bit
總共有128個character，其中95個是printable(0x20-0x7E)，前32個和最後一個是control characters (0x00-0x1F/0-31, 0x7F/127)。
數字的安排對應BCD的bit pattern(加上了011->hex 3)，簡單來說0 -> 0x30、9-> 0x39，只要 &0x0F 就可以得到數字。
可以用每32個字元為單位分成四快來看，第一塊是控制字元、第二塊(數字)第三塊(大寫字母)他的順序安排有些歷史因素，可對比以下的DEC SIXBIT、第四塊是小寫字母。