#author("2019-05-25T11:51:57+00:00","default:haikikyou","haikikyou")
#author("2019-05-26T07:33:00+00:00","default:haikikyou","haikikyou")
[[PostgreSQL/解析]]
#contents
* アーカイブ [#c2eb60f9]
データベースをPITR(ポイントタイムリカバリ)で任意の地点に復旧させるのにデータベースの先行書き込みログ(WAL)が必要となる。アーカイブ設定を行なうことで、PostgreSQLの機能により、walディレクトリのWALを定期的にバックアップ領域に保存できるようになる。PITRでは、ファイルシステムレベルのバックアップ(ex: pg_basebackup)とWALバックアップを用いて復旧させる。
* Archiver [#uad4ce3c]
&ref(./archiver.png);
- WALアーカイブを実行しているのは、Archiverプロセスである。
- Archiverプロセスは、起動後はメインループ内でアーカイブコピーを繰り返し行なう。
- メインループはLatchでwaitし、およそ&code(){PGARCH_AUTOWAKE_INTERVAL};(60秒)ごとにwaitから戻る。Latchのタイムアウトは以下で計算される。last_copy_timeは最後にアーカイブされた時間、curtimeはLatchでwaitする前に取得される時間。よってアーカイブされない時間がPGARCH_AUTOWAKE_INTERVAL秒を経過するとSetLatchでLatchが外れ(TIMEOUT)、&code(){wakened = true};となる。
PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time)
&label(warn){参考}; [[backend/postmaster/pgarch.c#pgarch_MainLoop()>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/postmaster/pgarch.c;h=dc2739bf36db4bbc318a82dba42f8e34d58b68ef;hb=refs/heads/REL_10_STABLE#l367]]
- archive_statusディレクトリから、.readyがサフィックスであるファイルを探す。~
historyファイルが存在する場合はそれが優先され、複数のhistoryファイルが存在する場合は、最も古いhistoryファイルが優先される。~
複数のWALが.readyになっている場合、一度にファイル単位でarchive_commandが実行される。1つのWALをアーカイブして、再びwaitするという訳ではない。よって、何らかの要因でarchive_commandに失敗し続け正常に戻った場合は、一斉にコピーが走る。
** アーカイブコピーの契機 [#refb53f8]
- Latchでタイムアウト秒経過する
- シグナル(SIGUSR1)を受けとる(SetLatchでLatchが外れる)
-- Postmasterはsigusr1_handlerで&code(){PMSIGNAL_WAKEN_ARCHIVER};シグナルであることを確認し、ArchiverにSIGUSR1を送る。Archiverは、シグナルハンドラで&code(){wakened = true};にフラグ更新。Latchが外れアーカイブコピーを実行する。~
ArchiverへのNotificationは、&code(){XLogArchiveNotify};関数で行われる。&code(){.ready};ファイルを作成し、&code(){SendPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER)};でsigusr1を送る、という流れの1セットとなっている。
&label(blockquote){引用}; [[XLogArchiveNotify()>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlogarchive.c;h=7825cfe532e2436115f26feee8d9502a24f340e1;hb=refs/heads/REL_10_STABLE#l491]]
#geshi(c){{{
/*
* XLogArchiveNotify
*
* Create an archive notification file
*
* The name of the notification file is the message that will be picked up
* by the archiver, e.g. we write 0000000100000001000000C6.ready
* and the archiver then knows to archive XLOGDIR/0000000100000001000000C6,
* then when complete, rename it to 0000000100000001000000C6.done
*/
void
XLogArchiveNotify(const char *xlog)
{
char archiveStatusPath[MAXPGPATH];
FILE *fd;
/* insert an otherwise empty file called <XLOG>.ready */
StatusFilePath(archiveStatusPath, xlog, ".ready");
fd = AllocateFile(archiveStatusPath, "w");
if (fd == NULL)
{
ereport(LOG,
(errcode_for_file_access(),
errmsg("could not create archive status file \"%s\": %m",
archiveStatusPath)));
return;
}
if (FreeFile(fd))
{
ereport(LOG,
(errcode_for_file_access(),
errmsg("could not write archive status file \"%s\": %m",
archiveStatusPath)));
return;
}
/* Notify archiver that it's got something to do */
if (IsUnderPostmaster)
SendPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER);
}
/*
* Convenience routine to notify using segment number representation of filename
*/
void
XLogArchiveNotifySeg(XLogSegNo segno)
{
char xlog[MAXFNAMELEN];
XLogFileName(xlog, ThisTimeLineID, segno);
XLogArchiveNotify(xlog);
}
}}}
** WALアーカイブ処理の流れ [#mf17460f]
- archive_statusディレクトリに.readyファイルがあるかチェックする
- archive_commandがセットされているか確認する。コマンドがセットされていない場合はアーカイブを実行しない。WARNINGが出る。
archive_mode enabled, yet archive_command is not set
- walディレクトリから、.readyで示されるwalログファイルを探す。該当するファイルが存在しない場合、孤児となった.readyを消す。消すのに失敗した場合、1秒のインターバルで&code(){NUM_ORPHAN_CLEANUP_RETRIES};(3回)失敗するまで試行する。
- WALアーカイブ
-- WALアーカイブは、&code(){archive_command};で指定されているコマンドを&code(){system};関数でキックすることで行われる。
-- WALアーカイブに成功した場合は、.readyを.doneにする。失敗した場合は、1秒のインターバルを置いてリトライする。3回失敗すると、少し長いインターバルPGARCH_AUTOWAKE_INTERVAL(60秒)のwaitの後、再度アーカイブコピーを実行する。待ちの間にシグナルで起こされる場合(SetLatchされる)は、PGARCH_AUTOWAKE_INTERVALより早く待ちから戻る。~
&label(warn){参考}; [[pgarch_ArchiverCopyLoop()>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/postmaster/pgarch.c;h=dc2739bf36db4bbc318a82dba42f8e34d58b68ef;hb=refs/heads/REL_10_STABLE#l414]]
- 統計情報コレクタに、Archiverの統計情報を送る(pg_stat_archiverビュー)。
&label(warn){参考};
- [[backend/postmaster/pgarch.c>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/postmaster/pgarch.c;h=dc2739bf36db4bbc318a82dba42f8e34d58b68ef;hb=refs/heads/REL_10_STABLE]] - &size(11){&color(gray){https://git.postgresql.org/gitweb/?p=postgresql.git;a=shortlog;h=refs/heads/REL_10_STABLE};};
** pg_stat_archiverビュー [#b8e9fcf6]
- pg_stat_archiverビューでは、以下のコマンドで示されるような情報を取得することができる。ただし、resetすると情報はクリアされるので、運用でアーカイブの失敗を監視する際には考慮に入れる必要があるだろう。
- また、failed_countは、archive_commandの失敗では3回リトライするため、3ずつインクリメントされる。
- 今のところ時間と回数が取れる。失敗種別までは判別できない。~
終了コードで判別できたりすると何かの役に立ちそうか??
#geshi(sql){{{
-- Archiverの統計情報確認
select * from pg_stat_archiver
-[ RECORD 1 ]------+------------------------------
archived_count | 1
last_archived_wal | 000000010000000000000001
last_archived_time | 2019-03-20 21:44:59.918306+09
failed_count | 0
last_failed_wal |
last_failed_time |
stats_reset | 2019-03-20 21:42:17.409563+09
-- リセット
select pg_stat_reset_shared('archiver')
-[ RECORD 1 ]--------+-
pg_stat_reset_shared |
-- リセットした後
select * from pg_stat_archiver
-[ RECORD 1 ]------+------------------------------
archived_count | 0
last_archived_wal |
last_archived_time |
failed_count | 0
last_failed_wal |
last_failed_time |
stats_reset | 2019-04-20 21:46:54.795811+09
}}}
意図的にアーカイブ先を存在しない場所に指定して、archive_commandの実行を何度か失敗させてみる。失敗した場合は、1秒ごとのインターバルで3回失敗するまで試行される。
#geshi{{{
----- 失敗
2019-03-21 23:24:00.156 JST [20894] DEBUG: executing archive command "cp pg_wal/000000010000000000000002 "/Users/guest/workspace/archivedir/000000010000000000000002""
cp: /Users/guest/workspace/archivedir/000000010000000000000002: No such file or directory
2019-03-21 23:24:00.160 JST [20894] LOG: archive command failed with exit code 1
2019-03-21 23:24:00.160 JST [20894] DETAIL: The failed archive command was: cp pg_wal/000000010000000000000002 "/Users/guest/workspace/archivedir/000000010000000000000002"
2019-03-21 23:24:01.161 JST [20894] DEBUG: executing archive command "cp pg_wal/000000010000000000000002 "/Users/guest/workspace/archivedir/000000010000000000000002""
cp: /Users/guest/workspace/archivedir/000000010000000000000002: No such file or directory
2019-03-21 23:24:01.165 JST [20894] LOG: archive command failed with exit code 1
2019-03-21 23:24:01.165 JST [20894] DETAIL: The failed archive command was: cp pg_wal/000000010000000000000002 "/Users/guest/workspace/archivedir/000000010000000000000002"
2019-03-21 23:24:02.166 JST [20894] DEBUG: executing archive command "cp pg_wal/000000010000000000000002 "/Users/guest/workspace/archivedir/000000010000000000000002""
cp: /Users/guest/workspace/archivedir/000000010000000000000002: No such file or directory
2019-03-21 23:24:02.170 JST [20894] LOG: archive command failed with exit code 1
2019-03-21 23:24:02.170 JST [20894] DETAIL: The failed archive command was: cp pg_wal/000000010000000000000002 "/Users/guest/workspace/archivedir/000000010000000000000002"
----- 60秒後
2019-03-21 23:25:02.169 JST [20894] DEBUG: executing archive command "cp pg_wal/000000010000000000000002 "/Users/guest/workspace/archivedir/000000010000000000000002""
cp: /Users/guest/workspace/archivedir/000000010000000000000002: No such file or directory
2019-03-21 23:25:02.173 JST [20894] LOG: archive command failed with exit code 1
2019-03-21 23:25:02.173 JST [20894] DETAIL: The failed archive command was: cp pg_wal/000000010000000000000002 "/Users/guest/workspace/archivedir/000000010000000000000002"
2019-03-21 23:25:03.174 JST [20894] DEBUG: executing archive command "cp pg_wal/000000010000000000000002 "/Users/guest/workspace/archivedir/000000010000000000000002""
cp: /Users/guest/workspace/archivedir/000000010000000000000002: No such file or directory
2019-03-21 23:25:03.178 JST [20894] LOG: archive command failed with exit code 1
2019-03-21 23:25:03.178 JST [20894] DETAIL: The failed archive command was: cp pg_wal/000000010000000000000002 "/Users/guest/workspace/archivedir/000000010000000000000002"
2019-03-21 23:25:04.179 JST [20894] DEBUG: executing archive command "cp pg_wal/000000010000000000000002 "/Users/guest/workspace/archivedir/000000010000000000000002""
cp: /Users/guest/workspace/archivedir/000000010000000000000002: No such file or directory
2019-03-21 23:25:04.183 JST [20894] LOG: archive command failed with exit code 1
2019-03-21 23:25:04.183 JST [20894] DETAIL: The failed archive command was: cp pg_wal/000000010000000000000002 "/Users/guest/workspace/archivedir/000000010000000000000002"
}}}
失敗と成功時の統計情報の更新状況を確認してみる。
#geshi(sql){{{
-- 失敗
select * from pg_stat_archiver
-[ RECORD 1 ]------+------------------------------
archived_count | 0
last_archived_wal |
last_archived_time |
failed_count | 36
last_failed_wal | 000000010000000000000002
last_failed_time | 2019-03-21 23:28:10.222111+09
stats_reset | 2019-03-20 21:46:54.795811+09
-- 失敗
-- failed_countは +3
-- last_failed_timeは、+62 (sec)
select * from pg_stat_archiver
-[ RECORD 1 ]------+------------------------------
archived_count | 0
last_archived_wal |
last_archived_time |
failed_count | 39
last_failed_wal | 000000010000000000000002
last_failed_time | 2019-03-21 23:29:12.234861+09
stats_reset | 2019-03-20 21:46:54.795811+09
... 省略
-- 成功
-- archived_countは、溜まっていたWALファイルの数で6
select * from pg_stat_archiver
-[ RECORD 1 ]------+------------------------------
archived_count | 6
last_archived_wal | 000000010000000000000007
last_archived_time | 2019-03-21 23:31:14.954457+09
failed_count | 42
last_failed_wal | 000000010000000000000002
last_failed_time | 2019-03-21 23:30:14.245314+09
stats_reset | 2019-03-20 21:46:54.795811+09
}}}
&label(warn){参考};
- [[pg_stat_archiverビュー>https://www.postgresql.jp/document/10/html/monitoring-stats.html#PG-STAT-ARCHIVER-VIEW]] - &size(11){&color(gray){on https://www.postgresql.jp/document/10/html/monitoring-stats.html#PG-STAT-ARCHIVER-VIEW};};
* WALセグメントの強制切り替え [#n008aed4]
** pg_switch_wal [#o8c5fda8]
&code(){pg_switch_wal};関数は、強制的にWALセグメントの切り替えを行なう。~
&code(){pg_waldump};でWALの内部を解析すると、XLOG rmgrのinfo値で&code(){XLOG_SWITCH};という種別のWALが書き込まれる。
#geshi{{{
rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 0/18000060, prev 0/18000028, desc: SWITCH
}}}
連続実行してもWAL位置に変更がなければスイッチはされない。その場合は、同じSWITCHポイントが戻る。
#geshi(sql){{{
# select pg_switch_wal();を実行している
postgres=# \watch 1
土 4/20 17:38:51 2019 (every 1s)
pg_switch_wal
---------------
0/45000078
(1 row)
土 4/20 17:38:55 2019 (every 1s)
pg_switch_wal
---------------
0/46000078
(1 row)
土 4/20 17:38:56 2019 (every 1s)
pg_switch_wal
---------------
0/47000000
(1 row)
土 4/20 17:38:57 2019 (every 1s)
pg_switch_wal
---------------
0/47000000
(1 row)
土 4/20 17:38:58 2019 (every 1s)
pg_switch_wal
---------------
0/47000000
(1 row)
土 4/20 17:38:59 2019 (every 1s)
pg_switch_wal
---------------
0/47000000
(1 row)
土 4/20 17:39:00 2019 (every 1s)
pg_switch_wal
---------------
0/47000000
(1 row)
}}}
&label(warn){参考};
- [[backend/access/transam/xlog.c#XLogFlush()>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=a93017feecffae7a7ff65d40b78a89556d55eb51;hb=refs/heads/REL_10_STABLE#l2800]] - &size(11){&color(gray){https://git.postgresql.org/gitweb/?p=postgresql.git;a=shortlog;h=refs/heads/REL_10_STABLE};};
- [[9.26.3. バックアップ制御関数>https://www.postgresql.jp/document/10/html/functions-admin.html]] - https://www.postgresql.jp/document/10/html/functions-admin.html
SWITCHによるWALセグメント切り替えを行なうと、WALセグメントはセグメントサイズまで消費される。
&label(blockquote){引用}; [[backend/access/transam/xlog.c#CopyXLogRecordToWAL()>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=a93017feecffae7a7ff65d40b78a89556d55eb51;hb=refs/heads/REL_10_STABLE#l1541]]
#geshi(c){{{
// backend/access/transam/xlog.c#CopyXLogRecordToWAL()
/*
* If this was an xlog-switch, it's not enough to write the switch record,
* we also have to consume all the remaining space in the WAL segment. We
* have already reserved that space, but we need to actually fill it.
*/
if (isLogSwitch && XLogSegmentOffset(CurrPos, wal_segment_size) != 0)
{
/* An xlog-switch record doesn't contain any data besides the header */
Assert(write_len == SizeOfXLogRecord);
/* Assert that we did reserve the right amount of space */
Assert(XLogSegmentOffset(EndPos, wal_segment_size) == 0);
/* Use up all the remaining space on the current page */
CurrPos += freespace;
/*
* Cause all remaining pages in the segment to be flushed, leaving the
* XLog position where it should be, at the start of the next segment.
* We do this one page at a time, to make sure we don't deadlock
* against ourselves if wal_buffers < wal_segment_size.
*/
while (CurrPos < EndPos)
{
/*
* The minimal action to flush the page would be to call
* WALInsertLockUpdateInsertingAt(CurrPos) followed by
* AdvanceXLInsertBuffer(...). The page would be left initialized
* mostly to zeros, except for the page header (always the short
* variant, as this is never a segment's first page).
*
* The large vistas of zeros are good for compressibility, but the
* headers interrupting them every XLOG_BLCKSZ (with values that
* differ from page to page) are not. The effect varies with
* compression tool, but bzip2 for instance compresses about an
* order of magnitude worse if those headers are left in place.
*
* Rather than complicating AdvanceXLInsertBuffer itself (which is
* called in heavily-loaded circumstances as well as this lightly-
* loaded one) with variant behavior, we just use GetXLogBuffer
* (which itself calls the two methods we need) to get the pointer
* and zero most of the page. Then we just zero the page header.
*/
currpos = GetXLogBuffer(CurrPos);
MemSet(currpos, 0, SizeOfXLogShortPHD);
CurrPos += XLOG_BLCKSZ;
}
}
else
{
/* Align the end position, so that the next record starts aligned */
CurrPos = MAXALIGN64(CurrPos);
}
}}}
また、強制切り替えを行なった後、切り替わりでSWITCHが書き込まれたWALログは、即座にアーカイブできるようにしている。内部的にはPostmasterに対し、Archiverを起こすためのシグナルを送っている。(&code(){PMSIGNAL_WAKEN_ARCHIVER};)
#geshi{{{
postgres=# select pg_switch_wal();
pg_switch_wal
---------------
0/18000078
(1 row)
}}}
pg_switch_wal実行時に出力されるログは以下(logレベルdebug5)。
#geshi{{{
2019-03-20 15:20:12.692 JST [51362] DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-03-20 15:20:14.895 JST [51362] DEBUG: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-03-20 15:20:14.895 JST [50570] DEBUG: executing archive command "cp pg_wal/000000010000000000000030 "/Users/guest/archivedir_p/000000010000000000000030""
2019-03-20 15:20:14.950 JST [50570] DEBUG: archived write-ahead log file "000000010000000000000030"
}}}
以下は、XLogArchiveNotify実行時のバックトレース(master: pg12)。
#geshi(c){{{
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 17.1
* frame #0: 0x0000000106fe89ed postgres`XLogArchiveNotify(xlog="000000010000000000000019") at xlogarchive.c:531
frame #1: 0x0000000106fe8afd postgres`XLogArchiveNotifySeg(segno=25) at xlogarchive.c:543
frame #2: 0x0000000106fd1015 postgres`XLogWrite(WriteRqst=(Write = 436207616, Flush = 436207616), flexible=false) at xlog.c:2540
frame #3: 0x0000000106fd027b postgres`XLogFlush(record=436207616) at xlog.c:2921
frame #4: 0x0000000106fcf34d postgres`XLogInsertRecord(rdata=0x000000010783d1d0, fpw_lsn=0, flags='\0') at xlog.c:1150
frame #5: 0x0000000106febd0f postgres`XLogInsert(rmid='\0', info='@') at xloginsert.c:462
frame #6: 0x0000000106fdf8a5 postgres`RequestXLogSwitch(mark_unimportant=false) at xlog.c:9387
frame #7: 0x0000000106fe9c79 postgres`pg_switch_wal(fcinfo=0x00007f8848808880) at xlogfuncs.c:292
frame #8: 0x00000001071927ac postgres`ExecInterpExpr(state=0x00007f8848808798, econtext=0x00007f8848808488, isnull=0x00007ffee8d3d81f) at execExprInterp.c:625
frame #9: 0x0000000107191b92 postgres`ExecInterpExprStillValid(state=0x00007f8848808798, econtext=0x00007f8848808488, isNull=0x00007ffee8d3d81f) at execExprInterp.c:1769
frame #10: 0x00000001071e2a3b postgres`ExecEvalExprSwitchContext(state=0x00007f8848808798, econtext=0x00007f8848808488, isNull=0x00007ffee8d3d81f) at executor.h:312
frame #11: 0x00000001071e29be postgres`ExecProject(projInfo=0x00007f8848808790) at executor.h:346
frame #12: 0x00000001071e26f3 postgres`ExecResult(pstate=0x00007f8848808370) at nodeResult.c:136
frame #13: 0x00000001071aa5d2 postgres`ExecProcNodeFirst(node=0x00007f8848808370) at execProcnode.c:445
frame #14: 0x00000001071a3332 postgres`ExecProcNode(node=0x00007f8848808370) at executor.h:244
frame #15: 0x000000010719ecf1 postgres`ExecutePlan(estate=0x00007f8848808118, planstate=0x00007f8848808370, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x00007f8847823650, execute_once=true) at execMain.c:1643
frame #16: 0x000000010719ebb2 postgres`standard_ExecutorRun(queryDesc=0x00007f8848802d18, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:362
frame #17: 0x000000010719e982 postgres`ExecutorRun(queryDesc=0x00007f8848802d18, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:306
frame #18: 0x00000001073eb006 postgres`PortalRunSelect(portal=0x00007f8847061518, forward=true, count=0, dest=0x00007f8847823650) at pquery.c:929
frame #19: 0x00000001073ea9bc postgres`PortalRun(portal=0x00007f8847061518, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x00007f8847823650, altdest=0x00007f8847823650, completionTag="") at pquery.c:770
frame #20: 0x00000001073e5fe8 postgres`exec_simple_query(query_string="select pg_switch_wal();") at postgres.c:1215
frame #21: 0x00000001073e5188 postgres`PostgresMain(argc=1, argv=0x00007f8847025538, dbname="postgres", username="guest") at postgres.c:4247
frame #22: 0x0000000107321aa0 postgres`BackendRun(port=0x00007f8846d01720) at postmaster.c:4401
frame #23: 0x0000000107320ea5 postgres`BackendStartup(port=0x00007f8846d01720) at postmaster.c:4092
frame #24: 0x000000010731fdfa postgres`ServerLoop at postmaster.c:1705
frame #25: 0x000000010731d6b9 postgres`PostmasterMain(argc=3, argv=0x00007f8846c03270) at postmaster.c:1378
frame #26: 0x0000000107221339 postgres`main(argc=3, argv=0x00007f8846c03270) at main.c:228
frame #27: 0x00007fff508df015 libdyld.dylib`start + 1
}}}
スイッチ切り替え時(isLogSwitchが真)は、XLogFlushが呼ばれている。
&label(blockquote){引用}; [[backend/access/transam/xlog.c#XLogInsertRecord()>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=a93017feecffae7a7ff65d40b78a89556d55eb51;hb=refs/heads/REL_10_STABLE#l1133]]
#geshi(c){{{
/*
* If this was an XLOG_SWITCH record, flush the record and the empty
* padding space that fills the rest of the segment, and perform
* end-of-segment actions (eg, notifying archiver).
*/
if (isLogSwitch)
{
TRACE_POSTGRESQL_WAL_SWITCH();
XLogFlush(EndPos);
/*
* Even though we reserved the rest of the segment for us, which is
* reflected in EndPos, we return a pointer to just the end of the
* xlog-switch record.
*/
if (inserted)
{
EndPos = StartPos + SizeOfXLogRecord;
if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
{
uint64 offset = XLogSegmentOffset(EndPos, wal_segment_size);
if (offset == EndPos % XLOG_BLCKSZ)
EndPos += SizeOfXLogLongPHD;
else
EndPos += SizeOfXLogShortPHD;
}
}
}
}}}
&code(){XLogArchiveNotify};関数でPostmasterにシグナルが送っていることが分かる。
&label(blockquote){引用}; [[backend/access/transam/xlog.c#XLogWrite()>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=a93017feecffae7a7ff65d40b78a89556d55eb51;hb=refs/heads/REL_10_STABLE#l2504]]
#geshi(c){{{
// backend/access/transam/xlog.c#XLogWrite()
/*
* If we just wrote the whole last page of a logfile segment,
* fsync the segment immediately. This avoids having to go back
* and re-open prior segments when an fsync request comes along
* later. Doing it here ensures that one and only one backend will
* perform this fsync.
*
* This is also the right place to notify the Archiver that the
* segment is ready to copy to archival storage, and to update the
* timer for archive_timeout, and to signal for a checkpoint if
* too many logfile segments have been used since the last
* checkpoint.
*/
if (finishing_seg)
{
issue_xlog_fsync(openLogFile, openLogSegNo);
/* signal that we need to wakeup walsenders later */
WalSndWakeupRequest();
LogwrtResult.Flush = LogwrtResult.Write; /* end of page */
if (XLogArchivingActive())
XLogArchiveNotifySeg(openLogSegNo); // Archiverに知らせる
XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
XLogCtl->lastSegSwitchLSN = LogwrtResult.Flush;
/*
* Request a checkpoint if we've consumed too much xlog since
* the last one. For speed, we first check using the local
* copy of RedoRecPtr, which might be out of date; if it looks
* like a checkpoint is needed, forcibly update RedoRecPtr and
* recheck.
*/
if (IsUnderPostmaster && XLogCheckpointNeeded(openLogSegNo))
{
(void) GetRedoRecPtr();
if (XLogCheckpointNeeded(openLogSegNo))
RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
}
}
}
}}}
なお、XLogArchiveNotifySegは、&code(){finishing_seg};が&code(){true};となるルートで呼ばれる。pg_switch_walでWALセグメントを最後まで消費し尽くした結果、&code(){finishing_seg};は&code(){true};となる。
#geshi(c){{{
finishing_seg = !ispartialpage &&
(startoffset + npages * XLOG_BLCKSZ) >= wal_segment_size;
}}}
&label(warn){参考};
- [[backend/access/transam/xlogarchive.c>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlogarchive.c;h=7825cfe532e2436115f26feee8d9502a24f340e1;hb=refs/heads/REL_10_STABLE#l491]] - &size(11){&color(gray){https://git.postgresql.org/gitweb/?p=postgresql.git;a=shortlog;h=refs/heads/REL_10_STABLE};};
*** pg_switch_wal()で切り替えられたWALの中身 [#a31d8929]
pg_switch_wal()で切り替えたWALの中身を見てみる。この例では、SWITCHの前にRUNNING_XACTSが入っている。WAL位置に変化がなければpg_switch_walによる切り替えは行われない。
#geshi{{{
$ pg_waldump 000000010000000000000004
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/04000028, prev 0/03000060, desc: RUNNING_XACTS nextXid 479 latestCompletedXid 478 oldestRunningXid 479
rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 0/04000060, prev 0/04000028, desc: SWITCH
}}}
WALは、セグメント最後まで0で埋められているようである。
#geshi{{{
$ hexdump -C 000000010000000000000004
00000000 99 d0 06 00 01 00 00 00 00 00 00 04 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 0c bc 5c 73 8b d0 e8 5c |..........\s...\|
00000020 00 00 00 01 00 20 00 00 32 00 00 00 00 00 00 00 |..... ..2.......|
00000030 60 00 00 03 00 00 00 00 10 08 00 00 af ec d1 36 |`..............6|
00000040 ff 18 00 00 00 00 00 00 00 00 00 06 04 95 df 01 |................|
00000050 00 00 df 01 00 00 de 01 00 00 00 00 00 00 00 00 |................|
00000060 18 00 00 00 00 00 00 00 28 00 00 04 00 00 00 00 |........(.......|
00000070 40 00 00 00 67 87 65 b0 00 00 00 00 00 00 00 00 |@...g.e.........|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
01000000
}}}
switch後は、walwriterによって新しいwalセグメントファイルが生成される。(tempファイルに8192byteごとに0埋めされたデータ書き込みが行われた後renameされる)
switch後は、walwriterによって新しいwalセグメントファイルが生成される。(tempファイルに8192byteごとに0埋めされたデータ書き込みが行われた後renameされる)。WAL書き込みは8192byteのブロックの倍数単位で行われる。
#geshi(c){{{
/* Need to seek in the file? */
if (openLogOff != startoffset)
{
if (lseek(openLogFile, (off_t) startoffset, SEEK_SET) < 0)
ereport(PANIC,
(errcode_for_file_access(),
errmsg("could not seek in log file %s to offset %u: %m",
XLogFileNameP(ThisTimeLineID, openLogSegNo),
startoffset)));
openLogOff = startoffset;
}
/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
nbytes = npages * (Size) XLOG_BLCKSZ;
nleft = nbytes;
do
{
errno = 0;
pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
written = write(openLogFile, from, nleft);
pgstat_report_wait_end();
if (written <= 0)
{
if (errno == EINTR)
continue;
ereport(PANIC,
(errcode_for_file_access(),
errmsg("could not write to log file %s "
"at offset %u, length %zu: %m",
XLogFileNameP(ThisTimeLineID, openLogSegNo),
openLogOff, nbytes)));
}
nleft -= written;
from += written;
} while (nleft > 0);
}}}
&label(warn){参考};
- [[backend/access/transam/xlog.c#XLogFileInit()>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=a93017feecffae7a7ff65d40b78a89556d55eb51;hb=refs/heads/REL_10_STABLE#l3168]]
- [[backend/access/transam/xlog.c#XLogWrite()>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=a93017feecffae7a7ff65d40b78a89556d55eb51;hb=refs/heads/REL_10_STABLE#l2474]]
* XLogArchiveNotifyのコールグラフ [#w44609ca]
Archiverを起こす人は誰か、参考までにコールグラフを以下に示す。
#geshi(c){{{
XLogArchiveNotify(const char *) : void
KeepFileRestoredFromArchive(const char *, const char *) : void
readTimeLineHistory(TimeLineID) : List *
rescanLatestTimeLine() : _Bool
StartReplication(StartReplicationCmd *) : void
WaitForWALToBecomeAvailable(XLogRecPtr, _Bool, _Bool, XLogRecPtr) : _Bool
XLogFileReadAnyTLI(XLogSegNo, int, int) : int
XLogReadDetermineTimeline(XLogReaderState *, XLogRecPtr, uint32) : void
XLogSendPhysical() : void
restoreTimeLineHistoryFiles(TimeLineID, TimeLineID) : void
rescanLatestTimeLine() : _Bool
StartupXLOG() : void
XLogFileRead(XLogSegNo, int, TimeLineID, int, _Bool) : int
WaitForWALToBecomeAvailable(XLogRecPtr, _Bool, _Bool, XLogRecPtr) : _Bool
XLogFileReadAnyTLI(XLogSegNo, int, int) : int (2 一致)
StartupXLOG() : void
InitPostgres(const char *, Oid, const char *, Oid, char *, _Bool) : void
AutoVacLauncherMain(int, char * *) : void
AutoVacWorkerMain(int, char * *) : void
BackgroundWorkerInitializeConnection(const char *, const char *, uint32) : void
BackgroundWorkerInitializeConnectionByOid(Oid, Oid, uint32) : void
BootstrapModeMain() : void
PostgresMain(int, char * *, const char *, const char *) : void
StartupProcessMain() : void
AuxiliaryProcessMain(int, char * *) : void
WalReceiverMain() : void
AuxiliaryProcessMain(int, char * *) : void
main(int, char * *) : int
StartChildProcess(AuxProcType) : pid_t
MaybeStartWalReceiver() : void
PostmasterMain(int, char * *) : void
PostmasterStateMachine() : void (2 一致)
reaper(int) : void (3 一致)
ServerLoop() : int (3 一致)
sigusr1_handler(int) : void (2 一致)
writeTimeLineHistory(TimeLineID, TimeLineID, XLogRecPtr, char *) : void
StartupXLOG() : void
InitPostgres(const char *, Oid, const char *, Oid, char *, _Bool) : void
AutoVacLauncherMain(int, char * *) : void
AutoVacWorkerMain(int, char * *) : void
BackgroundWorkerInitializeConnection(const char *, const char *, uint32) : void
BackgroundWorkerInitializeConnectionByOid(Oid, Oid, uint32) : void
BootstrapModeMain() : void
PostgresMain(int, char * *, const char *, const char *) : void
StartupProcessMain() : void
AuxiliaryProcessMain(int, char * *) : void
XLogArchiveCheckDone(const char *) : _Bool
CleanupBackupHistory() : void
do_pg_stop_backup(char *, _Bool, TimeLineID *) : XLogRecPtr
perform_base_backup(basebackup_options *) : void
pg_stop_backup_v2(FunctionCallInfo) : Datum (2 一致)
pg_stop_backup(FunctionCallInfo) : Datum
RemoveOldXlogFiles(XLogSegNo, XLogRecPtr, XLogRecPtr) : void
CreateCheckPoint(int) : void
CheckpointerMain() : void
AuxiliaryProcessMain(int, char * *) : void
RequestCheckpoint(int) : void
createdb(ParseState *, const CreatedbStmt *) : Oid (2 一致)
do_pg_start_backup(const char *, _Bool, TimeLineID *, StringInfo, List * *, StringInfo, _Bool, _Bool) : XLogRecPtr
dropdb(const char *, _Bool) : void
DropTableSpace(DropTableSpaceStmt *) : void
movedb(const char *, const char *) : void (2 一致)
standard_ProcessUtility(PlannedStmt *, const char *, ProcessUtilityContext, ParamListInfo, QueryEnvironment *, DestReceiver *, char *) : void
StartupXLOG() : void (2 一致)
XLogPageRead(XLogReaderState *, XLogRecPtr, int, XLogRecPtr, char *, TimeLineID *) : int
XLogWrite(XLogwrtRqst, _Bool) : void
ShutdownXLOG(int, Datum) : void
CheckpointerMain() : void
InitPostgres(const char *, Oid, const char *, Oid, char *, _Bool) : void
StartupXLOG() : void
InitPostgres(const char *, Oid, const char *, Oid, char *, _Bool) : void
StartupProcessMain() : void
CreateRestartPoint(int) : _Bool
CheckpointerMain() : void
ShutdownXLOG(int, Datum) : void
XLogArchiveNotifySeg(XLogSegNo) : void
XLogWrite(XLogwrtRqst, _Bool) : void
AdvanceXLInsertBuffer(XLogRecPtr, _Bool) : void
GetXLogBuffer(XLogRecPtr) : char *
XLogBackgroundFlush() : _Bool
XLogBackgroundFlush() : _Bool
WalSndWaitForWal(XLogRecPtr) : XLogRecPtr
WalWriterMain() : void
XLogFlush(XLogRecPtr) : void
CheckPointReplicationOrigin() : void
CreateCheckPoint(int) : void
CreateEndOfRecoveryRecord() : void
EndPrepare(GlobalTransaction) : void
finish_sync_worker() : void
FlushBuffer(BufferDesc *, SMgrRelation) : void
RecordTransactionAbortPrepared(TransactionId, int, TransactionId *, int, RelFileNode *, const char *) : void
RecordTransactionCommit() : TransactionId
RecordTransactionCommitPrepared(TransactionId, int, TransactionId *, int, RelFileNode *, int, SharedInvalidationMessage *, _Bool, const char *) : void
RelationTruncate(Relation, BlockNumber) : void
ReplicationSlotReserveWal() : void
replorigin_get_progress(RepOriginId, _Bool) : XLogRecPtr
replorigin_session_get_progress(_Bool) : XLogRecPtr
SlruPhysicalWritePage(SlruCtl, int, int, SlruFlush) : _Bool
smgr_redo(XLogReaderState *) : void
write_relmap_file(_Bool, RelMapFile *, _Bool, _Bool, _Bool, Oid, Oid, const char *) : void
WriteMTruncateXlogRec(Oid, MultiXactId, MultiXactId, MultiXactOffset, MultiXactOffset) : void
WriteTruncateXlogRec(int, TransactionId, Oid) : void
xact_redo_commit(xl_xact_parsed_commit *, TransactionId, XLogRecPtr, RepOriginId) : void (2 一致)
XLogInsertRecord(XLogRecData *, XLogRecPtr, uint8) : XLogRecPtr
XLogReportParameters() : void
XLogWalRcvWrite(char *, Size, XLogRecPtr) : void
XLogWalRcvProcessMsg(unsigned char, char *, Size) : void
WalReceiverMain() : void
AuxiliaryProcessMain(int, char * *) : void
}}}
* archive_timeout [#g08eff1f]
WALセグメントの強制切り替えは、&code(){archive_timeout};を指定しても可能である。この値は、デフォルトは0で無効である。小さな値に設定すると、頻繁にWALセグメント切り替えが走るため、無駄にディスク領域を使用してしまう。
WALセグメントの切り替えは、pg_switch_wal関数が呼び出しているRequestXLogSwitch()に帰着する。したがって、archive_timeoutは、pg_switch_wal()を指定時間ごとに実行している、と言える。archive_timeoutも前回の切り替わり位置からWAL位置に変更がなければ、無駄な切り替えは行われない。
archive_timeoutは、最後にWALスイッチされた時刻からの経過秒を見ている。何らかのきっかけで、最後のarchive_timeout以降にWALセグメントがスイッチされている場合、スイッチされた時間からarchive_timeout秒経過するまで、無駄にスイッチされることはない。
&label(blockquote){引用}; [[backend/postmaster/checkpointer.c#CheckArchiveTimeout()>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/postmaster/checkpointer.c;h=2d0a704c9f16c8428984ffb51adf13f90eae2443;hb=refs/heads/REL_10_STABLE#l576]]
#geshi(c){{{
/*
* CheckArchiveTimeout -- check for archive_timeout and switch xlog files
*
* This will switch to a new WAL file and force an archive file write if
* meaningful activity is recorded in the current WAL file. This includes most
* writes, including just a single checkpoint record, but excludes WAL records
* that were inserted with the XLOG_MARK_UNIMPORTANT flag being set (like
* snapshots of running transactions). Such records, depending on
* configuration, occur on regular intervals and don't contain important
* information. This avoids generating archives with a few unimportant
* records.
*/
static void
CheckArchiveTimeout(void)
{
pg_time_t now;
pg_time_t last_time;
XLogRecPtr last_switch_lsn;
if (XLogArchiveTimeout <= 0 || RecoveryInProgress())
return;
now = (pg_time_t) time(NULL);
/* First we do a quick check using possibly-stale local state. */
if ((int) (now - last_xlog_switch_time) < XLogArchiveTimeout)
return;
/*
* Update local state ... note that last_xlog_switch_time is the last time
* a switch was performed *or requested*.
*/
last_time = GetLastSegSwitchData(&last_switch_lsn); // 最後にスイッチされた時刻
last_xlog_switch_time = Max(last_xlog_switch_time, last_time); // 最後のスイッチ時刻
/* Now we can do the real checks */
if ((int) (now - last_xlog_switch_time) >= XLogArchiveTimeout) // archive_timeout秒経過している?
{
/*
* Switch segment only when "important" WAL has been logged since the
* last segment switch (last_switch_lsn points to end of segment
* switch occurred in).
*/
if (GetLastImportantRecPtr() > last_switch_lsn)
{
XLogRecPtr switchpoint;
/* mark switch as unimportant, avoids triggering checkpoints */
switchpoint = RequestXLogSwitch(true);
/*
* If the returned pointer points exactly to a segment boundary,
* assume nothing happened.
*/
if ((switchpoint % XLogSegSize) != 0)
elog(DEBUG1, "write-ahead log switch forced (archive_timeout=%d)",
XLogArchiveTimeout);
}
/*
* Update state in any case, so we don't retry constantly when the
* system is idle.
*/
last_xlog_switch_time = now; // CheckArchiveTimeoutによる最後のスイッチ時刻を更新
}
}
}}}
* WALアーカイブの動作確認 [#r350d896]
pg_switch_wal、archive_timeoutのタイミングで何が起きているのかウォッチするコマンドのサンプル。
#geshi(bash){{{
# クラスタセットアップ
$ pgenv cluster -D test setup --start-port 5432 p
$ cat <<EOF >> test/p.conf
archive_timeout = 30
log_min_messages = debug5
EOF
$ pgenv cluster -D test exec p pg_ctl restart
}}}
別のターミナルでpg_waldumpでウォッチする。
#geshi(bash){{{
$ watch --interval 1 'for f in `ls test/p/pg_wal/0* | sort | tail -2`; do echo "===== $f" && pg_waldump $f;done'
===== test/p/pg_wal/000000010000000000000005
rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn: 0/05000028, prev 0/040000D0, desc: CHECKPOINT_SHUTDOWN redo 0/5000028; tli 1; prev tli 1; fpw true; xid 0:479; oid 12914; multi 1; offset 0; oldest xid 472 in DB 1;
oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/05000098, prev 0/05000028, desc: RUNNING_XACTS nextXid 479 latestCompletedXid 478 oldestRunningXid 479
rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 0/050000D0, prev 0/05000098, desc: SWITCH
===== test/p/pg_wal/000000010000000000000006
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/06000028, prev 0/050000D0, desc: RUNNING_XACTS nextXid 479 latestCompletedXid 478 oldestRunningXid 479
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/06000060, prev 0/06000028, desc: RUNNING_XACTS nextXid 479 latestCompletedXid 478 oldestRunningXid 479
rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn: 0/06000098, prev 0/06000060, desc: CHECKPOINT_ONLINE redo 0/6000060; tli 1; prev tli 1; fpw true; xid 0:479; oid 12914; multi 1; offset 0; oldest xid 472 in DB 1; o
ldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 479; online
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/06000108, prev 0/06000098, desc: RUNNING_XACTS nextXid 479 latestCompletedXid 478 oldestRunningXid 479
rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 0/06000140, prev 0/06000108, desc: SWITCH
}}}
別のターミナルでpg_walとアーカイブ先をウォッチする。
#geshi(bash){{{
$ watch --interval 1 "bash -c 'ls test/archivedir_p/0* ; ls test/p/pg_wal/0*' | tail -10"
test/archivedir_p/000000010000000000000003
test/archivedir_p/000000010000000000000004
test/archivedir_p/000000010000000000000005
test/archivedir_p/000000010000000000000006
test/p/pg_wal/000000010000000000000001
test/p/pg_wal/000000010000000000000002
test/p/pg_wal/000000010000000000000003
test/p/pg_wal/000000010000000000000004
test/p/pg_wal/000000010000000000000005
test/p/pg_wal/000000010000000000000006
}}}
別のターミナルでサーバログをウォッチする。
#geshi(bash){{{
$ pgenv cluster -D test tail -f p
2019-03-23 00:54:34.026 JST [1971] DEBUG: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-03-23 00:54:34.026 JST [1971] DEBUG: shmem_exit(0): 1 before_shmem_exit callbacks to make
2019-03-23 00:54:34.026 JST [1971] DEBUG: shmem_exit(0): 7 on_shmem_exit callbacks to make
2019-03-23 00:54:34.026 JST [1971] DEBUG: proc_exit(0): 2 callbacks to make
2019-03-23 00:54:34.026 JST [1971] DEBUG: exit(0)
2019-03-23 00:54:34.026 JST [1971] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2019-03-23 00:54:34.026 JST [1971] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2019-03-23 00:54:34.026 JST [1971] DEBUG: proc_exit(-1): 0 callbacks to make
2019-03-23 00:54:34.026 JST [76569] DEBUG: reaping dead processes
2019-03-23 00:54:34.026 JST [76569] DEBUG: server process (PID 1971) exited with exit code 0
}}}
別のターミナルでpg_switch_wal()を実行する。
#geshi(bash){{{
$ echo "select pg_switch_wal()" | pgenv cluster -D test exec p psql
pg_switch_wal
---------------
0/7000000
(1 row)
}}}
&label(warn){参考}; [[pgenv2>https://github.com/moritetu/pgenv2]] - &size(11){&color(gray){on https://github.com/moritetu/pgenv2};};
* 参考リンク [#p7aa34cf]
- [[25.3. 継続的アーカイブとポイントインタイムリカバリ(PITR)>https://www.postgresql.jp/document/10/html/continuous-archiving.html]] - &size(11){&color(gray){on https://www.postgresql.jp/document/10/html/continuous-archiving.html};};
- [[PostgreSQLのarchiverの挙動を調べた>https://qiita.com/U_ikki/items/dd9a58857a4ffb84f97d]] - &size(11){&color(gray){on https://qiita.com/U_ikki/items/dd9a58857a4ffb84f97d]};};
- [[あなたの知らないPostgreSQL監視の世界>https://www.slideshare.net/naka24nori/postgre-sql-55593111]] - &size(11){&color(gray){on https://www.slideshare.net/naka24nori/postgre-sql-55593111};};