Transactional Information Systems 5章 MVCC勉強会　第一回　議論メモ

勉強会について

Transactional Information Systems 5章 MVCC勉強会　第一回の議論メモです。

自分のメモをベースにまとめています。発言の聞き間違い、解釈違いの可能性があることをご了承ください。

特記の無い引用は本で議論した箇所を示しています。

Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery (The Morgan Kaufmann Series in Data Management Systems)

作者: Gerhard Weikum,Gottfried Vossen
出版社/メーカー: Morgan Kaufmann
発売日: 2001/05/24
メディア: ハードカバー
クリック: 2回
この商品を含むブログ (2件) を見る

[memo]は私のメモです。勉強会で議論された内容ではないことにご注意ください。
また、関連資料・引用・リンクを適宜挿入しています。

勉強会の目的・方針

write に強い DB を作りたい。現状は write が遅い。

Multi Version (MV) なら Write Lock をとらずに write できるはず。だが、現状の実装に Write Lock をとらないものはない。この状況を変えたい。

今回の勉強会は読むよりディスカッションを重視したい。

5.1 Goal and Overview

So far the underlying assumption has been that data items exist in exactly one copy each.

この本でここまでは1つのコピーが前提だった。

concurrent な間は MV だが未来永劫 MV なわけではない。commit したら Single Version (SV) になる。

read は read from、つまり何を読むかが大事。

It will turn out that versioning demands some reasonable extensions of serializability theory, and, as will be shown in Part Ⅲ, versioning is connected to recovery, as versions of modified data (before images or after images, or both) are sometimes maintained in some form by a recovery manager anyway.

recovery でも version の考え方を導入しないといけない。例えば、write の before / after を利用して recovery するのは実質 MV ともいえる。
だが、本来は recovery と Concurrency Control (CC) を分けて考えないといけない。この点を意識して本を読み進める必要がある。

昔（本が想定している時代）と今では commit の前提が変わっているのではないか？
昔の commit はどこまで recovery しなければならないかを示すものだった。
今は oneshot request / group commit が前提となっている。また、write の実行順序の制御に commit が利用されるようになってきていると思われる。

w₁w₂c₂c₁ というスケジュールを SV では実現できない。w₁ を w₂ で上書きした後、c₂ よりも後に c₁ が来ているからだ。だが、MV なら可能だ。

We will also look at the storage space needed to hold multiple versions.

Storage Space とはどこを指しているのか？
本の時代はディスクベースでメモリはバッファとして利用していた。
現在はメモリベースとなっている。
この前提で考えた時にどうか？

While theoretical considerations tend to assume space to be an unlimited resource, it turns out that the picture changes significanty should this not be valid anymore. In particular, we will show how multiversion serializability is affected by limiting the (total) number of versions around.

commit した後も version の保存は必要なのだろうか？recovery ができるのであれば、必要としているより古い version は捨ててよいのではないか？

MVは resource というよりデータ構造が議論の的になるのではないか？

lost update は view の anomaly だ。update の事実が lost しているわけではない。MV なら何も考えずに write を行うことができるので、writeを lost するということもない。

w₀(x)c₀r₁(x)r₂(x)w₁(x)w₂(x)c₁c₂
このスケジュールの問題点は t₁ も t₂ も update するつもりなのに、同じ item の同じ version を t₁ / t₂ が read していることではないか？

An important assumption we will make in this chapter is that versioning is transparent to the outside world. In other words, users or applications will not be aware of the fact that data items can appear in multiple versions.

ユーザーサイドに version を意識させることはない。でも、一部例外はある。
version のデータベースがある。MVCC とそれを混同してはならない。

[memo]
version のデータベースにはバージョン管理システム（Subversion, Git 等）のようなものも含むのか？

この点について記事を公開後にコメントをいただきました。version database は従来の relational database の機能に加え、version の概念も取り込んだ database とのことです。

One alternative is to use a source code version control system like git or svn to manage dataset versions. However, source code version control systems are both inefficient at storing unordered structured datasets, and do not support advanced querying capabilities, e.g., querying for versions that satisfy some predicate, performing joins across versions, or computing some aggregate statistics across versions [12].
（中略）
To answer this question we develop a system, ORPHEUSDB1 , to “bolt-on” versioning capabilities to a traditional relational database system that is unaware of the existence of versions. By doing so, we seamlessly leverage the analysis and querying capabilities that come “for free” with a database system, along with efficient versioning capabilities.
http://www.vldb.org/pvldb/vol10/p1130-huang.pdf

CASE (computer-aided software engineering) とはモデルを描くだけでコードが生成されるような古のツールのこと。
Computer Aided Software Engineering - Wikipedia

The discussion in this chapter continues to consider unaborted transactions only.

unabort を前提にするのはよくない（考慮すべき）。

ここでの abort は system abort ではなく user abort を指している。
serializable であるかの validation に失敗した時は system abort となる。

system commit とはDBが決める commit のことだ。この本の "commit" の前提でもある。
user commit とはユーザーが意思表示する commit のことだ。

[memo]
user commit はクライアント側が発行する commit のことであり、system commit は DB 側が CC を実現するために実行する commit のことと理解すればよいか？DB は CC のためにクライアント側の commit 要求の順序を変更することが可能なため。

これからの DB は 1 SQL 1 commit でよくないか？つまり、インタラクティブな操作を認めない。AutoCommit のイメージ。そうでないと速くすることはできない。

5.2 Multiversion Schedules

EXAMPLE 5.1
※一部を抜粋
s = r₁(x)w₁(x)r₂(x)w₂(y)r₁(y)w₁(z)c₁c₂
s ∉ CSR
r₁(y) is "too late" for making the schedule that already has conflict (on x) between t₁ and t₂ an acceptable one.
Now suppose that the old value of y (in this the initial value of y) were still available when r₁(y) arrived.
s' = r₁(x)w₁(x)r₁(y)r₂(x)w₂(y)w₁(z)c₁c₂
s' ∈ CSR

s の r₁(y) を前に持っていくだけでCSRになるのであれば、何とか実現する方法があるのではないか？

read で何を読むかを決めなければいけない。

先に進むには Conflict Serializable (CSR) を理解しておく必要がある。
CSR の conflict の定義とは r-w, w-r, w-w であり、つまり2つの transaction (TX) が同じ item を操作する際に一方が write であるということだ。

データ処理の競合関係「だけ」に注目した解釈を導入します。この競合(conflict)を特にデータの書き込みについてのみ注目します。すなわち
w-w　あるTX書き込んで、別のTXがまた書き込む
w-r　あるTXが書き込んだものを、別のTXが読み込む
r-w　あるTXが読み込んだものを、別のTXが書き込む
この競合関係が維持されていれば、同一のスケジュールとみなすという「解釈」です。まあ、ぶっちゃけr-rの関係は順序は関係ないでしょ？という発想ですね。すなわち、書き込みの競合関係が同一である集合に属していれば、そのスケジュールは、同じコンテクストであると意味です。この集合をconflict serializableといいます。
Welcome back to the TRANSACTION! - 急がば回れ、選ぶなら近道

SV の conflict をイメージしすぎると MV で混乱することになる。

実装はCSRでもMVSRでも両者ともどもエッジグラフでのvalidationになるので、結果としては「同じ」read-setとwrite-setのintersectionの話になっているところがややこしい。やっていることは似ているが、導かれているクライテリアは異なるということだ。
Multi-version Conflict Serializability - 急がば回れ、選ぶなら近道

本の P99 に CSR で許される並び替えルール (C1 - C4) が記述されている。

Our presentation initially considers only totally ordered schedules since these are closer to the "algebraic" nature of the following commutativity rules for page model data operations. In these rules "～" means that the ordered pair of actions on the left-hand side can be replaced by the right-hand side, and vice versa.
Rule C1: r_i(x)r_j(y) ～ r_j(y)r_i(x) if i ≠ j
Rule C2: r_i(x)w_j(y) ～ w_j(y)r_i(x) if i ≠ j, x ≠ y
Rule C3: w_i(x)w_j(y) ～ w_j(y)w_i(x) if i ≠ j, x ≠ y

This additional rule simply states that two unordered operations can be arbitrarily ordered if they are nonconflicting. We refer to this rule as the ordering rule:
Rule C4:
o_i(x), p_j(y) unordered
↝ o_i(x)p_j(y) if x ≠ y ∨ (o = r ∧ p = r)

View Serializable (VSR) は Herbrand Semantics で表現できる。
Serializablitiyとは？ 2. View Serializabilityについて - Qiita

read-from が変わらないことは CSR であることの一つの条件である。
C1 - C4 のルールを破ると read-from が壊れてしまう。正確には壊す可能性があるということであり、厳密にはr_∞を考える必要がある。

Basically, a read step of the form r(x) reads an (existing) version of x, and a write step of the form w(x) (always) creates a new version of x (or overwrites an existing one).

r(x) の existing を最新の version の事とは断言していない。複数 version が存在する可能性を残している。
例えば group commit で w-r-w を考えると、read は通常であれば前の write をreadするが、後の write を read することにしても良い。
PostgreSQL には DEFERRABLE READ ONLY があるが、これは後の write を読むのと同じことを実現しているように思える。

この本での history とは逐次実行の事である。verify で並べ替えるのはまた別の話となる。

MV では read 発行時に version が決まることはない。取りうるスケジュールは複数ある。Rule C4 は半順序から全順序を生成するルールとなる。

C4だけが特殊なルールとして存在する．

C1,C2,C3は暗黙的にスケジュール/ヒストリの命令列に全順序があることを要求している．

しかしConcurrency Controlではタプルごとの命令の半順序しか得られないことが多い．

半順序から全順序を生成するルールが以下のC4.

Reducibility - nikezono

DEFINITION 5.1 Version Function
Let s be a history with initialization transaction t₀ and final transaction t_∞. A version function for s is a function h, which associates with each read step of s a previous write step on the same data item, and which is the identity on write steps.

read version の決め方にはいくつもある。例えば、
・自分の write
・直前の write
・commit 済みの write
のようにいくつも取りうる。

Commit, Install, TX の order はそれぞれ異なる。

DEFINITION 5.1 で h は Herbrand をイメージしているのではないか？history に h ではなく s を割り当てていることからもうかがえる。

the result of applying a version function to a read operation could be any of (1) the associated write operation, (2) the version assigned for reading, or (3) the "versioned" read operation. Since all three possibilities are basically equivalent, this should not cause any confusion.

(1)～(3) が equivalent とは？なぜあえて説明しているのか？そこにどんな意図があるのか？
(2) は read そのものが version を指定する場合。
(3) は TX そのものの version を read の version に適用する場合。

w₁w₂c₁
latest なものを read するのであれば w₂ の version となる。
だが、commit したものを read するのであれば w₁ のversion となる。

本で想定している実装は貧弱。なので、全体的に実装を避けて書いているようにも見える。

参考

Transactional Information Systems 5章 MVCC勉強会第一回に参加 - Yabu.log
yuYabu さんが勉強会のメモを早々にまとめられていました。仕事が早い！

Transactional Information Systems Chapter 5読み会 - nikezono
中園さんの勉強会メモ。

デヴすぴスラ (@dev_supisula) | Twitter
デヴすぴスラさんが Twitter で勉強会の内容を踏まえた考察を重ねられています。

更新情報

2018/11/29

version database について追記しました。
中園さんの勉強会メモのリンクを追記しました。

ぱと隊長日誌

ブログ運用もエンジニアとしての生き方も模索中