DBA

개인 학습, 작업 기록용입니다. 올바르지 않은 정보가 있다면 댓글주시면 감사하겠습니다!

DATABASE/Postgresql

[PostgreSQL] PANIC: could not locate a valid checkpoint record 해결 [pg_resetwal]

DBnA 2024. 11. 14. 16:44

스펙을 최대한 낮춰서 EC2 ubuntu환경에 postgresql 15 설치하다 사용 중 PG깨지는 현상이 발생했다..

 

가장 큰 문제는 wal log를 수용할 디스크공간이 없던것!!

psql 을 접속할 수 없어서 아무생각없이 wal log폴더 내 데이터 삭제해버림 ㅠ 

 

그 결과 Postgresql 서버가 올라오지 못하고 DB시스템이 shut down됨!! 

2024-11-14 16:05:19.069 KST [7685] LOG:  starting PostgreSQL 15.8 (Ubuntu 15.8-1.pgdg24.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0, 64-bit
2024-11-14 16:05:19.071 KST [7685] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-11-14 16:05:19.071 KST [7685] LOG:  listening on IPv6 address "::", port 5432
2024-11-14 16:05:19.075 KST [7685] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-11-14 16:05:19.083 KST [7689] LOG:  database system was shut down in recovery at 2024-11-14 15:33:24 KST
2024-11-14 16:05:19.083 KST [7689] LOG:  invalid primary checkpoint record
2024-11-14 16:05:19.083 KST [7689] PANIC:  could not locate a valid checkpoint record
2024-11-14 16:05:19.224 KST [7685] LOG:  startup process (PID 7689) was terminated by signal 6: Aborted
2024-11-14 16:05:19.224 KST [7685] LOG:  aborting startup due to startup process failure
2024-11-14 16:05:19.226 KST [7685] LOG:  database system is shut down

 


작업이 중단된 상태에서 롤백시 읽을 wal로그가 없으니..트랜잭션이 꼬이고꼬이고..

체크포인트 에러 발생하여 DB가 올라오지않음.
테스트용 DB라 백업파일도 없고 중요 데이터 작업은 아니였으므로 pg_resetwal 로 실행시켰다. 

 

pg_resetwal : wal log 마지막 정상시점으로 초기화시켜줌 ( 기존의 wal file 사라짐)

*서버실행할 수 없을때 특단의 조치 

 

 

사용법

#사용전 서버 중지해야함!!

pg_resetwal[ -f| --force] [ -n| --dry-run] [ option...] [ -D| --pgdata]datadir

 

해결과정

#psql 접속안됨 
postgres@ip-10-0-10-87:~/15/main/log$ psql
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
        Is the server running locally and accepting connections on that socket?

#작업 전 PG 서비스내림 
root@ip-10-0-10-87:~# systemctl stop postgresql
root@ip-10-0-10-87:~# systemctl status postgresql
○ postgresql.service - PostgreSQL RDBMS
     Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; preset: enabled)
     Active: inactive (dead) since Thu 2024-11-14 16:19:10 KST; 12min ago
   Duration: 13min 50.912s
    Process: 7692 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
   Main PID: 7692 (code=exited, status=0/SUCCESS)
        CPU: 3ms

Nov 14 16:05:19 ip-10-0-10-87 systemd[1]: Starting postgresql.service - PostgreSQL RDBMS...
Nov 14 16:05:19 ip-10-0-10-87 systemd[1]: Finished postgresql.service - PostgreSQL RDBMS.
Nov 14 16:19:10 ip-10-0-10-87 systemd[1]: postgresql.service: Deactivated successfully.
Nov 14 16:19:10 ip-10-0-10-87 systemd[1]: Stopped postgresql.service - PostgreSQL RDBMS.


#PG 데이터 
postgres@ip-10-0-10-87:~$ cd $PGDATA

#명령어 입력
postgres@ip-10-0-10-87:~/15/main$ pg_resetwal -f /var/lib/postgresql/15/main
Write-ahead log reset

#pg start
root@ip-10-0-10-87:~# systemctl start postgresql
root@ip-10-0-10-87:~# systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
     Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; preset: enabled)
     Active: active (exited) since Thu 2024-11-14 16:32:33 KST; 3s ago
    Process: 8225 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
   Main PID: 8225 (code=exited, status=0/SUCCESS)
        CPU: 1ms

Nov 14 16:32:33 ip-10-0-10-87 systemd[1]: Starting postgresql.service - PostgreSQL RDBMS...
Nov 14 16:32:33 ip-10-0-10-87 systemd[1]: Finished postgresql.service - PostgreSQL RDBMS.

#psql 정상 접속 확인
postgres@ip-10-0-10-87:~/15/main$ psql
psql (15.8 (Ubuntu 15.8-1.pgdg24.04+1))
Type "help" for help.

postgres=#

 

#wal 파일 조회 -> 기존에 있던 다수의 파일이 삭제됨
postgres@ip-10-0-10-87:/$ cd /var/lib/postgresql/15/main/pg_wal
postgres@ip-10-0-10-87:~/15/main/pg_wal$ ls
000000010000000200000079  archive_status

 

 

 

테스트라 다행이지..운영에선 백업열심히하고 최대한 사용하지 않는걸로....

 

 

 

 

** 추가 확인

작업했던 DB와 테이블을 알아서 조회해보니.. 

ERROR:  pg_attribute catalog is missing 4 attribute(s) for relation OID 24736

역시나 에러 발생한다.

postgres=# \c pgbenchtest
You are now connected to database "pgbenchtest" as user "postgres".
pgbenchtest=# \dt+
ERROR:  pg_attribute catalog is missing 4 attribute(s) for relation OID 24736


pgbenchtest=# \x
Expanded display is on.
pgbenchtest=# select * from pg_class where oid =24736;
-[ RECORD 1 ]-------+-----------------
oid                 | 24736
relname             | pgbench_accounts
relnamespace        | 2200

 

해결하기 전에 postgresql.conf 파일 수정할게 있어서 수정하고 재시작했더니

해당 오브젝트가 그냥 사라짐..

그러고 정상조회된다.

나중에 확인해봐야할듯..