Category Archives: C Language

C++11 unordered_set benchmark

結論: insert, erase, find: O(n)insert ~ 100Nfind ~ 50N (N以walk為倍數基準) compare with preallocate 1M bucket rehash(1000000) No preallocate walk: 0 μs (100)walk: 2 μs (1000)walk: 19 μs (10000)walk: 196 μs (100000)walk: 1988 μs (1000000)insert: 40 μs (100)insert: 309 μs (1000)insert: … Continue reading

Posted in C Language | Leave a comment

two’s complement overflow detection

two’s complement overflow有兩種情況會發生 carry in without carry out carry out without carry in 用8 bit signed int為例 -100 + (-30) 1001 1100 + 1110 0010 ———— 1 0000 0000 <- carry bit C S = 1 0111 1110 C的位置是carry out(就是我們一般說的carry), … Continue reading

Posted in C Language | Leave a comment

one’s & two’s complement number system

在計算機的底層是用0、1來表示資料,因此人類的符號系統要透過電腦表示就需要做編碼轉換,ASCII、Unicode等都是做此用途,而在(整數)數字系統的編碼,比較常見的有 ASCII (將數字用ASCII表示) -200 => 2D 32 30 30 BCD 189 => 01 08 09 (packed BCD) one’s complement (assume 8-bit size) 0 => 0000 0000b -0 => 1111 1111b (unsigned = 255) -1=> 1111 1110b (unsigned = 254) 127 … Continue reading

Posted in C Language | Leave a comment

C語言的struct hack整理

在C語言中,有一個技巧叫struct hack,主要是要實現動態大小的struct 要紀錄的資訊有名字欄位,可以這樣設計 但是這邊假設name存的字串最長是31 + 1 bytes (null terminated string),如果要能動態長度就要改用指標處理,例如 這個的缺點就是 name的記憶體配置是另外配置,和person_info不在同一處,可能無法利用到cpu cache。 struct hack的作法是讓尾端可以動態大小,此時在struct最後面定義一個place holder malloc 會allocate 一塊空間包含struct member所需的,剩下的空間可以用name來存取 注意這裡name的用途是拿來做place holder,不用另外計算struct之後的起始位置,因為有時候有alignment padding的問題。 struct hack的作法只能用在最後一個member是不定大小。在C99 可以用flexible array member來描述 char name[]; flexible array member算是incomplete array type,所以不能直接sizeof §6.2.5 #12 An array type … Continue reading

Posted in C Language | Leave a comment

Keywords (annotated)

The above tokens (case sensitive) are reserved (in translation phases 7 and 8) for use as keywords, and shall not be used otherwise. keywords出現在phase7開始,在之前是屬於preprocessor token的identifier 上面的紅框是C99新增的keyword,_Bool定義成_Bool而不是bool,主要的原因是在C89的標準中,對於reserved identifier是用 底線+大寫字母開頭 或是 雙底線。提供了後續標準新增的關鍵字避免使用上與使用者衝突。另外在標準中提到 Implementation-defined keywords shall have the form of an identifier reserved … Continue reading

Posted in C Language | Leave a comment

Identifier (annotated)

參考: https://en.cppreference.com/w/c/language/identifier 有關C99 identifier的定義 6.4.2.1 An identifier is a sequence of nondigit characters (including the underscore _, the lowercase and uppercase Latin letters, and other characters) and digits. C99也支援universal-character-name \u or \U開頭的UCN來作為identifier的字元(可以在identifier開頭) \u接4個hex \U接8個hex 當然也考慮了上面提到的不能digit開頭的限制 Each universal character name in … Continue reading

Posted in C Language | Leave a comment

C99 Terms, definitions, and symbols (annotated)

整理一下C99在Terms, definitions, and symbols提到的一些名詞定義的重點整理,總共有19個。 3.1 access 在標準中提到的access 即是對object存取,read or modify。這裡modify包含對object存入相同的值 ‘Modify’ includes the case where the new value being stored is the same as the previous value 另外是如果Expressions沒有被evaluate就沒有access(例如short circuit evaluation) Expressions that are not evaluated do not access objects. 3.2 … Continue reading

Posted in C Language | Leave a comment

Phases of translation (annotated)

在C99的translation phases共分8步 。translation phases描述了從C source code到program image的處理流程。 參考整理 https://en.cppreference.com/w/c/language/translation_phases &ISO C99標準 The C source file is processed by the compiler as if the following phases take place, in this exact order. Actual implementation may combine these actions or process them … Continue reading

Posted in C Language | Leave a comment

ASCII (annotated)

參考 https://en.cppreference.com/w/c/language/ascii 作一些補註 ASCII定義0x00-0x7F,超過的部分(第8bit)算是extended 8bit,看不同的標準有不同的定義,早期有些拿來當成parity bit 總共有128個character,其中95個是printable(0x20-0x7E),前32個和最後一個是control characters (0x00-0x1F/0-31, 0x7F/127)。 數字的安排對應BCD的bit pattern(加上了011->hex 3),簡單來說0 -> 0x30、9-> 0x39,只要 &0x0F 就可以得到數字。 可以用每32個字元為單位分成四快來看,第一塊是控制字元、第二塊(數字)第三塊(大寫字母)他的順序安排有些歷史因素,可對比以下的DEC SIXBIT、第四塊是小寫字母。 上面順便列出EBCDIC(發音: eb-SEE-dick) 供比較,主要使用在IBM mainframe上,注意大部分可印字元集中在後半區,並且A-Z, a-z並不是連續的

Posted in C Language | Leave a comment

C/C++ comments (annotated)

參考 https://en.cppreference.com/w/c/comment整理一些注意的重點 All comments are removed from the program at translation phase 3 by replacing each comment with a single whitespace character. 這裡描述在translation phase3對於comment做的事,用一個space character取代 translation phase 3可參考C99 §5.1.1.2 (p.10) 3. The source file is decomposed into preprocessing tokens6) and … Continue reading

Posted in C Language | Leave a comment