PostgreSQL/解析

アーカイブ

データベースをPITR(ポイントタイムリカバリ)で任意の地点に復旧させるのにデータベースの先行書き込みログ(WAL)が必要となる。アーカイブ設定を行なうことで、PostgreSQLの機能により、walディレクトリのWALを定期的にバックアップ領域に保存できるようになる。PITRでは、ファイルシステムレベルのバックアップ(ex: pg_basebackup)とWALバックアップを用いて復旧させる。

Archiver

archiver.png

アーカイブコピーの契機

WALアーカイブ処理の流れ

参考

pg_switch_wal()

pg_switch_wal関数は、強制的にWALセグメントの切り替えを行なう。
pg_waldumpでWALの内部を解析すると、XLOG rmgrのinfo値でXLOG_SWITCHという種別のWALが書き込まれる。

rmgr: XLOG        len (rec/tot):     24/    24, tx:          0, lsn: 0/18000060, prev 0/18000028, desc: SWITCH 

連続実行してもWAL位置に変更がなければスイッチはされない。その場合は、同じSWITCHポイントが戻る。

-- Archiverの統計情報確認
SELECT * FROM pg_stat_archiver
-[ RECORD 1 ]------+------------------------------
archived_count     | 1
last_archived_wal  | 000000010000000000000001
last_archived_time | 2019-03-20 21:44:59.918306+09
failed_count       | 0
last_failed_wal    | 
last_failed_time   | 
stats_reset        | 2019-03-20 21:42:17.409563+09

-- リセット
SELECT pg_stat_reset_shared('archiver')
-[ RECORD 1 ]--------+-
pg_stat_reset_shared | 

-- リセットした後
SELECT * FROM pg_stat_archiver
-[ RECORD 1 ]------+------------------------------
archived_count     | 0
last_archived_wal  | 
last_archived_time | 
failed_count       | 0
last_failed_wal    | 
last_failed_time   | 
stats_reset        | 2019-04-20 21:46:54.795811+09

参考

SWITCH切り替え時、WALセグメントはセグメントサイズまで消費される。

引用 backend/access/transam/xlog.c#CopyXLogRecordToWAL()

-- 失敗
SELECT * FROM pg_stat_archiver
-[ RECORD 1 ]------+------------------------------
archived_count     | 0
last_archived_wal  | 
last_archived_time | 
failed_count       | 36
last_failed_wal    | 000000010000000000000002
last_failed_time   | 2019-03-21 23:28:10.222111+09
stats_reset        | 2019-03-20 21:46:54.795811+09

-- 失敗
-- failed_countは +3
-- last_failed_timeは、+62 (sec)
SELECT * FROM pg_stat_archiver
-[ RECORD 1 ]------+------------------------------
archived_count     | 0
last_archived_wal  | 
last_archived_time | 
failed_count       | 39
last_failed_wal    | 000000010000000000000002
last_failed_time   | 2019-03-21 23:29:12.234861+09
stats_reset        | 2019-03-20 21:46:54.795811+09

... 省略

-- 成功
-- archived_countは、溜まっていたWALファイルの数で6
SELECT * FROM pg_stat_archiver
-[ RECORD 1 ]------+------------------------------
archived_count     | 6
last_archived_wal  | 000000010000000000000007
last_archived_time | 2019-03-21 23:31:14.954457+09
failed_count       | 42
last_failed_wal    | 000000010000000000000002
last_failed_time   | 2019-03-21 23:30:14.245314+09
stats_reset        | 2019-03-20 21:46:54.795811+09

また、強制切り替えを行なった後、切り替わりでSWITCHが書き込まれたWALログは、即座にアーカイブできるようにしている。内部的にはPostmasterに対し、archiverを起こすためのシグナルが送られている。(PMSIGNAL_WAKEN_ARCHIVER

postgres=# select pg_switch_wal();
 pg_switch_wal 
---------------
 0/18000078
(1 row)

logレベルdebug5で出るメッセージ例は以下。

2019-03-20 15:20:12.692 JST [51362] DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-03-20 15:20:14.895 JST [51362] DEBUG:  CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-03-20 15:20:14.895 JST [50570] DEBUG:  executing archive command "cp pg_wal/000000010000000000000030 "/Users/guest/archivedir_p/000000010000000000000030""
2019-03-20 15:20:14.950 JST [50570] DEBUG:  archived write-ahead log file "000000010000000000000030"

以下はバックトレース(master: pg12)。

# SELECT pg_switch_wal();を実行している

postgres=# \watch 14/20 17:38:51 2019 (every 1s)

 pg_switch_wal 
---------------
 0/45000078
(1 ROW)4/20 17:38:55 2019 (every 1s)

 pg_switch_wal 
---------------
 0/46000078
(1 ROW)4/20 17:38:56 2019 (every 1s)

 pg_switch_wal 
---------------
 0/47000000
(1 ROW)4/20 17:38:57 2019 (every 1s)

 pg_switch_wal 
---------------
 0/47000000
(1 ROW)4/20 17:38:58 2019 (every 1s)

 pg_switch_wal 
---------------
 0/47000000
(1 ROW)4/20 17:38:59 2019 (every 1s)

 pg_switch_wal 
---------------
 0/47000000
(1 ROW)4/20 17:39:00 2019 (every 1s)

 pg_switch_wal 
---------------
 0/47000000
(1 ROW)
 

XLogArchiveNotify関数でpostmasterにシグナルが送られていることが分かる。

引用 backend/access/transam/xlog.c#XLogInsertRecord()

// backend/access/transam/xlog.c#CopyXLogRecordToWAL()
	/*
	 * If this was an xlog-switch, it's not enough to write the switch record,
	 * we also have to consume all the remaining space in the WAL segment.  We
	 * have already reserved that space, but we need to actually fill it.
	 */
	if (isLogSwitch && XLogSegmentOffset(CurrPos, wal_segment_size) != 0)
	{
		/* An xlog-switch record doesn't contain any data besides the header */
		Assert(write_len == SizeOfXLogRecord);

		/* Assert that we did reserve the right amount of space */
		Assert(XLogSegmentOffset(EndPos, wal_segment_size) == 0);

		/* Use up all the remaining space on the current page */
		CurrPos += freespace;

		/*
		 * Cause all remaining pages in the segment to be flushed, leaving the
		 * XLog position where it should be, at the start of the next segment.
		 * We do this one page at a time, to make sure we don't deadlock
		 * against ourselves if wal_buffers < wal_segment_size.
		 */
		while (CurrPos < EndPos)
		{
			/*
			 * The minimal action to flush the page would be to call
			 * WALInsertLockUpdateInsertingAt(CurrPos) followed by
			 * AdvanceXLInsertBuffer(...).  The page would be left initialized
			 * mostly to zeros, except for the page header (always the short
			 * variant, as this is never a segment's first page).
			 *
			 * The large vistas of zeros are good for compressibility, but the
			 * headers interrupting them every XLOG_BLCKSZ (with values that
			 * differ from page to page) are not.  The effect varies with
			 * compression tool, but bzip2 for instance compresses about an
			 * order of magnitude worse if those headers are left in place.
			 *
			 * Rather than complicating AdvanceXLInsertBuffer itself (which is
			 * called in heavily-loaded circumstances as well as this lightly-
			 * loaded one) with variant behavior, we just use GetXLogBuffer
			 * (which itself calls the two methods we need) to get the pointer
			 * and zero most of the page.  Then we just zero the page header.
			 */
			currpos = GetXLogBuffer(CurrPos);
			MemSet(currpos, 0, SizeOfXLogShortPHD);

			CurrPos += XLOG_BLCKSZ;
		}
	}
	else
	{
		/* Align the end position, so that the next record starts aligned */
		CurrPos = MAXALIGN64(CurrPos);
	}

引用 backend/access/transam/xlog.c#XLogWrite()

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 17.1
  * frame #0: 0x0000000106fe89ed postgres`XLogArchiveNotify(xlog="000000010000000000000019") at xlogarchive.c:531
    frame #1: 0x0000000106fe8afd postgres`XLogArchiveNotifySeg(segno=25) at xlogarchive.c:543
    frame #2: 0x0000000106fd1015 postgres`XLogWrite(WriteRqst=(Write = 436207616, Flush = 436207616), flexible=false) at xlog.c:2540
    frame #3: 0x0000000106fd027b postgres`XLogFlush(record=436207616) at xlog.c:2921
    frame #4: 0x0000000106fcf34d postgres`XLogInsertRecord(rdata=0x000000010783d1d0, fpw_lsn=0, flags='\0') at xlog.c:1150
    frame #5: 0x0000000106febd0f postgres`XLogInsert(rmid='\0', info='@') at xloginsert.c:462
    frame #6: 0x0000000106fdf8a5 postgres`RequestXLogSwitch(mark_unimportant=false) at xlog.c:9387
    frame #7: 0x0000000106fe9c79 postgres`pg_switch_wal(fcinfo=0x00007f8848808880) at xlogfuncs.c:292
    frame #8: 0x00000001071927ac postgres`ExecInterpExpr(state=0x00007f8848808798, econtext=0x00007f8848808488, isnull=0x00007ffee8d3d81f) at execExprInterp.c:625
    frame #9: 0x0000000107191b92 postgres`ExecInterpExprStillValid(state=0x00007f8848808798, econtext=0x00007f8848808488, isNull=0x00007ffee8d3d81f) at execExprInterp.c:1769
    frame #10: 0x00000001071e2a3b postgres`ExecEvalExprSwitchContext(state=0x00007f8848808798, econtext=0x00007f8848808488, isNull=0x00007ffee8d3d81f) at executor.h:312
    frame #11: 0x00000001071e29be postgres`ExecProject(projInfo=0x00007f8848808790) at executor.h:346
    frame #12: 0x00000001071e26f3 postgres`ExecResult(pstate=0x00007f8848808370) at nodeResult.c:136
    frame #13: 0x00000001071aa5d2 postgres`ExecProcNodeFirst(node=0x00007f8848808370) at execProcnode.c:445
    frame #14: 0x00000001071a3332 postgres`ExecProcNode(node=0x00007f8848808370) at executor.h:244
    frame #15: 0x000000010719ecf1 postgres`ExecutePlan(estate=0x00007f8848808118, planstate=0x00007f8848808370, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x00007f8847823650, execute_once=true) at execMain.c:1643
    frame #16: 0x000000010719ebb2 postgres`standard_ExecutorRun(queryDesc=0x00007f8848802d18, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:362
    frame #17: 0x000000010719e982 postgres`ExecutorRun(queryDesc=0x00007f8848802d18, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:306
    frame #18: 0x00000001073eb006 postgres`PortalRunSelect(portal=0x00007f8847061518, forward=true, count=0, dest=0x00007f8847823650) at pquery.c:929
    frame #19: 0x00000001073ea9bc postgres`PortalRun(portal=0x00007f8847061518, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x00007f8847823650, altdest=0x00007f8847823650, completionTag="") at pquery.c:770
    frame #20: 0x00000001073e5fe8 postgres`exec_simple_query(query_string="select pg_switch_wal();") at postgres.c:1215
    frame #21: 0x00000001073e5188 postgres`PostgresMain(argc=1, argv=0x00007f8847025538, dbname="postgres", username="guest") at postgres.c:4247
    frame #22: 0x0000000107321aa0 postgres`BackendRun(port=0x00007f8846d01720) at postmaster.c:4401
    frame #23: 0x0000000107320ea5 postgres`BackendStartup(port=0x00007f8846d01720) at postmaster.c:4092
    frame #24: 0x000000010731fdfa postgres`ServerLoop at postmaster.c:1705
    frame #25: 0x000000010731d6b9 postgres`PostmasterMain(argc=3, argv=0x00007f8846c03270) at postmaster.c:1378
    frame #26: 0x0000000107221339 postgres`main(argc=3, argv=0x00007f8846c03270) at main.c:228
    frame #27: 0x00007fff508df015 libdyld.dylib`start + 1

参考

XLogArchiveNotifyのコールグラフ

Archiverを起こす人は誰か、参考までにコールグラフを以下に示す。

	/*
	 * If this was an XLOG_SWITCH record, flush the record and the empty
	 * padding space that fills the rest of the segment, and perform
	 * end-of-segment actions (eg, notifying archiver).
	 */
	if (isLogSwitch)
	{
		TRACE_POSTGRESQL_WAL_SWITCH();
		XLogFlush(EndPos);

		/*
		 * Even though we reserved the rest of the segment for us, which is
		 * reflected in EndPos, we return a pointer to just the end of the
		 * xlog-switch record.
		 */
		if (inserted)
		{
			EndPos = StartPos + SizeOfXLogRecord;
			if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
			{
				uint64		offset = XLogSegmentOffset(EndPos, wal_segment_size);

				if (offset == EndPos % XLOG_BLCKSZ)
					EndPos += SizeOfXLogLongPHD;
				else
					EndPos += SizeOfXLogShortPHD;
			}
		}
	}

参考リンク


トップ   一覧 単語検索 最終更新   ヘルプ   最終更新のRSS
目次
ダブルクリックで閉じるTOP | 閉じる
GO TO TOP