将设为首页浏览此站
开启辅助访问 天气与日历 收藏本站联系我们切换到窄版

易陆发现论坛

 找回密码
 开始注册
查看: 296|回复: 0
收起左侧

Ceph集群报错解决方案笔记

[复制链接]
发表于 2021-7-26 14:43:14 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。

您需要 登录 才可以下载或查看,没有帐号?开始注册

x
0 当前Ceph版本和CentOS版本:
9 b/ x- |" w1 Q+ G9 }5 z6 M0 y[root@ceph1 ceph]# ceph -v
% B0 d6 k1 |* j  m; y+ nceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
5 y+ p' O* E; O# b$ s! y[root@ceph1 ceph]# cat /etc/redhat-release
% I  L9 D: G4 X5 R  hCentOS Linux release 7.5.1804 (Core)4 W5 T& B1 j3 p0 }
6 q; i, C/ ~  d9 u8 ]  }* Y1 h
% j9 n& A2 z+ x
1.节点间配置文件内容不一致错误/ z$ M, b- _3 C; A0 z3 S
输入ceph-deploy mon create-initial命令获取密钥key,会在当前目录(如我的是~/etc/ceph/)下生成几个key,但报错如下。意思是:就是配置失败的两个结点的配置文件的内容于当前节点不一致,提示使用--overwrite-conf参数去覆盖不一致的配置文件。  |* G# ?( y3 [4 Q1 P
[root@ceph1 ceph]# ceph-deploy mon create-initial) r4 D( w8 b6 x) U" F: X! m
...
% y* @7 w8 ]4 B: }[ceph2][DEBUG ] remote hostname: ceph2  H6 @* ]8 H( t* r0 Y" N3 e# _% s& [
[ceph2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf$ ], E$ r% ]1 K9 V7 O+ a! @
[ceph_deploy.mon][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite
' l$ t0 p9 Z% Q0 `, n[ceph_deploy][ERROR ] GenericError: Failed to create 2 monitors
  |$ G' N5 F* y8 Z...8 O2 @8 a* Z& N( B

: n* Z* ^+ y2 a9 x' O* U7 r输入命令如下(此处我共配置了三个结点ceph1~3):
  `7 g8 O3 F: _3 R2 J! z[root@ceph1 ceph]# ceph-deploy --overwrite-conf mon create ceph{3,1,2}
: S7 I4 L( p0 g7 E0 P...6 o/ {* C& c; \7 W) h5 w5 I0 e
[ceph2][DEBUG ] remote hostname: ceph2. ^* i' `$ u& ^! ]% ]
[ceph2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf# A! b8 |5 S8 v$ Y* @& F" a( ?5 [
[ceph2][DEBUG ] create the mon path if it does not exist
  z( g6 G5 y' |, m! l9 S2 c* A[ceph2][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph2/done+ I' S' L8 H- J5 `7 c
...
8 g* Z7 T+ B  k  b8 `: R3 R/ U
6 e; w& }# R4 Z& {( P7 K之后配置成功,可继续进行初始化磁盘操作。8 e* ~7 \( Y2 ^1 R3 i, i
2.too few PGs per OSD (21 < min 30)警告
0 O( _+ u4 N4 z[root@ceph1 ceph]# ceph -s
$ u' K$ H3 w8 H- i3 S  cluster:
& Y: j  V+ E& B2 a) w    id:     8e2248e4-3bb0-4b62-ba93-f597b1a3bd40
! j% u8 V5 k$ |$ x% M    health: HEALTH_WARN
  t" _0 W9 k% }3 x: u8 m            too few PGs per OSD (21 < min 30)4 I* k3 N0 i* ^  m

' p8 `' H. E1 k( f! A9 y  services:
3 @7 k( i# S0 N$ q# U* U1 {4 q% P    mon: 3 daemons, quorum ceph2,ceph1,ceph3
# x% ~5 d. I/ j" H/ v4 `    mgr: ceph2(active), standbys: ceph1, ceph3* L5 Y/ ^4 g( v
    osd: 3 osds: 3 up, 3 in
, c# P+ ?* j1 {  i9 s: Q0 k* R    rgw: 1 daemon active
: v0 m4 N2 U* h; }! D ) P' g9 ]1 t& e# {, @
  data:
& }- O6 V0 V$ [    pools:   4 pools, 32 pgs" p" ~' L( J8 m3 @. R+ |1 y
    objects: 219  objects, 1.1 KiB- z3 b+ |" e" y* u0 N  }5 ^
    usage:   3.0 GiB used, 245 GiB / 248 GiB avail  A* P9 }5 u- b1 `/ s& Z
    pgs:     32 active+clean5 r7 F( T5 d# l, A8 X4 E
( b5 ^" V( l" Z& q  A- v

$ ?! \' _9 E6 E- z) N  u5 C从上面集群状态信息可查,每个osd上的pg数量=21<最小的数目30个。pgs为32,因为我之前设置的是2副本的配置,所以当有3个osd的时候,每个osd上均分了32÷3*2=21个pgs,也就是出现了如上的错误 小于最小配置30个。
( _* n; b7 V1 J3 k3 i* \" V集群这种状态如果进行数据的存储和操作,会发现集群卡死,无法响应io,同时会导致大面积的osd down。1 o/ B6 ]! G. S- A, m0 D
解决办法:增加pg数" d& w6 o8 z: }3 s/ X
因为我的一个pool有8个pgs,所以我需要增加两个pool才能满足osd上的pg数量=48÷3*2=32>最小的数目30。9 M- }6 R6 T4 Z) _& w- @
[root@ceph1 ceph]# ceph osd pool create mytest 8) D' H7 p$ P9 b1 v6 e: Z7 `; q! q% ]
pool 'mytest' created
1 t! \4 w) v% Y3 p6 ^% c& b[root@ceph1 ceph]# ceph osd pool create mytest1 8
0 j' |3 c) g: o& n  v" rpool 'mytest1' created1 J- ~1 h, S( _( m3 g" f; g
[root@ceph1 ceph]# ceph -s4 F' [1 Q7 p( c
  cluster:
$ ~8 T2 _8 w1 k$ x5 n    id:     8e2248e4-3bb0-4b62-ba93-f597b1a3bd40
6 R8 P# J2 h9 W  K    health: HEALTH_OK+ q6 p5 B% ]3 e! O# u' |8 T

- v. n4 m- h( p7 I4 R9 a  services:
# \% |+ n& }4 Q+ `    mon: 3 daemons, quorum ceph2,ceph1,ceph3* A" v6 U* v( a! h2 r
    mgr: ceph2(active), standbys: ceph1, ceph3: e' s* k& ^3 d0 e/ L- V
    osd: 3 osds: 3 up, 3 in0 W8 T, |3 W  D# k% |
    rgw: 1 daemon active
+ [, ?" K- c# a. p+ @ : K- A2 l6 k# K
  data:
3 _: E& D4 k0 r" b0 c    pools:   6 pools, 48 pgs( |' n; Y5 z% E  A: O
    objects: 219  objects, 1.1 KiB  _) c: @: a' ]" @
    usage:   3.0 GiB used, 245 GiB / 248 GiB avail
3 A% W* P7 q8 i/ r# B8 V/ I# D$ J1 H# s    pgs:     48 active+clean4 T4 m1 X$ Y/ f% v

0 B( I; Z5 F; g! a9 v- C8 z集群健康状态显示正常。  D+ n/ J) F- r# H
3.集群状态是HEALTH_WARN application not enabled on 1 pool(s)7 h! ~$ u+ j( a+ c) @2 q
如果此时,查看集群状态是HEALTH_WARN application not enabled on 1 pool(s):1 Q) R" S7 a* v- J) b$ S6 b
[root@ceph1 ceph]# ceph -s
1 ?9 E8 R4 @8 K  cluster:) |& M" Y: M( P% K. m% k
    id:     13430f9a-ce0d-4d17-a215-272890f47f284 w3 W2 l9 A2 g( z2 q* Y; l
    health: HEALTH_WARN
# i$ P+ a, w6 a5 R            application not enabled on 1 pool(s)+ u. d' j* H  M# M. n% V) [) X
. l& ?& g* r" }8 U0 {8 d5 n
[root@ceph1 ceph]# ceph health detail+ y8 e8 G$ k% T5 H6 J
HEALTH_WARN application not enabled on 1 pool(s)
9 y( I4 g7 z& x) {/ H$ _POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)' z3 w( j" Q, Z: K
    application not enabled on pool 'mytest'
; g) d0 K# `" O: `' `2 M    use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications.& s7 @0 ~) z: P1 R6 H
8 J. v3 D- z  @0 S
运行ceph health detail命令发现是新加入的存储池mytest没有被应用程序标记,因为之前添加的是RGW实例,所以此处依提示将mytest被rgw标记即可:* Z, o; |9 s+ p3 W/ e
[root@ceph1 ceph]# ceph osd pool application enable mytest rgw
9 l1 [( c  S7 B: w" g5 k- Penabled application 'rgw' on pool 'mytest'
* b# F9 Y/ ~, l6 d% \
6 @; H+ L* y/ j. M: V- r6 e% Y6 b再次查看集群状态发现恢复正常$ e# x: k$ m7 }3 `5 E% l
[root@ceph1 ceph]# ceph health
* B. Y& y7 p7 RHEALTH_OK
, h9 ]& ~2 k- u! Y3 x- y$ T' T* ^( J" N, G; d& B; H
4.删除存储池报错
2 U% [% B" L3 Z6 y以下以删除mytest存储池为例,运行ceph osd pool rm mytest命令报错,显示需要在原命令的pool名字后再写一遍该pool名字并最后加上--yes-i-really-really-mean-it参数. M1 C6 R* L9 N) g) J! n
[root@ceph1 ceph]# ceph osd pool rm mytest+ ^' d. A1 w0 l8 X$ V: D* a( N) Y( q
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool mytest.  If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it.
) f5 m: E6 D3 o% I0 z1 x3 F: L- r) r
按照提示要求复写pool名字后加上提示参数如下,继续报错:: s( U2 x/ o9 c; |" I# X
[root@ceph1 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-it/ g1 U) A$ p4 G6 y
Error EPERM: pool deletion is disabled; you must first set the
' \& ?3 s' _- f; i" ~mon_allow_pool_delete config option to true before you can destroy a pool
0 Z3 v$ S- F: A9 `4 l* }" K
6 a/ R' |' `; Z  T3 V错误信息显示,删除存储池操作被禁止,应该在删除前现在ceph.conf配置文件中增加mon_allow_pool_delete选项并设置为true。所以分别登录到每一个节点并修改每一个节点的配置文件。操作如下:# n( z+ k, C$ k0 s! o9 F
[root@ceph1 ceph]# vi ceph.conf
0 d3 D, o) w; [  E[root@ceph1 ceph]# systemctl restart ceph-mon.target7 |  `" y* e$ [5 R# ^7 I1 i

- |, }* ?& Q* E9 b' r# f$ d2 n在ceph.conf配置文件底部加入如下参数并设置为true,保存退出后使用systemctl restart ceph-mon.target命令重启服务。
% H9 S4 b% O: k3 N3 H[mon], k) E+ i& B. ?! x
mon allow pool delete = true
) O2 y% q2 t2 V. H" \/ F
) q& ~& b8 U- M其余节点操作同理。: Z5 G& H8 @: P
[root@ceph2 ceph]# vi ceph.conf 7 h1 c, e$ j' ^$ n7 d( h4 D
[root@ceph2 ceph]# systemctl restart ceph-mon.target
! X% ~5 I, B; i0 W3 q[root@ceph3 ceph]# vi ceph.conf . K1 S: V8 @: f! u, A# p
[root@ceph3 ceph]# systemctl restart ceph-mon.target5 \+ Y) I+ h  p
; R% r' A6 B3 ]
再次删除,即成功删除mytest存储池。
* C3 X& h( ]9 E[root@ceph1 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-it" L- ^; J# H& d) q7 w
pool 'mytest' removed
5 U/ d$ n' {: ?& }4 B' `* N) y, f2 E3 k: {; x, ^; M4 l- y, s- j
5.集群节点宕机后恢复节点排错1 n( t9 u6 H" `1 e& B) q7 b
笔者将ceph集群中的三个节点分别关机并重启后,查看ceph集群状态如下:
$ Z) h' t; ^8 {0 {) V1 ~1 d! h9 D2 d[root@ceph1 ~]# ceph -s
  ]9 f6 X( w/ G/ B  cluster:
- O$ n* p: `' \2 d9 e1 {    id:     13430f9a-ce0d-4d17-a215-272890f47f28
0 Y) U" p9 ^' z' j) \    health: HEALTH_WARN$ H6 f$ U% W' j' b
            1 MDSs report slow metadata IOs' Y: H. y9 l5 u- `7 z
            324/702 objects misplaced (46.154%)
! i  r" D0 _  i- O            Reduced data availability: 126 pgs inactive. u3 S" `( |9 U: X) n* Z( h
            Degraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersized3 B$ W8 ~9 ]: j& r8 L

" h- ]4 c; i9 Q$ ~+ R  services:
% Y; [" P* I7 E. R) H    mon: 3 daemons, quorum ceph2,ceph1,ceph3
; r) E' C, g/ j( v5 I    mgr: ceph1(active), standbys: ceph2, ceph3
6 h3 ~7 R. Q2 H: s: B* E    mds: cephfs-1/1/1 up  {0=ceph1=up:creating}6 Z: v4 d) W+ k8 {' ?( U
    osd: 3 osds: 3 up, 3 in; 162 remapped pgs
' f; B, ]" [# W9 g
$ C$ j7 z$ h. @$ n3 c% P# X) f( `# `1 U  data:' k, C& \) a$ p# n  a
    pools:   8 pools, 288 pgs: h5 a9 K& J5 x3 \. q7 S# d: x
    objects: 234  objects, 2.8 KiB
; Y/ ]9 N& D! y    usage:   3.0 GiB used, 245 GiB / 248 GiB avail; W3 I6 s4 U2 C+ S% |9 C. F5 Q
    pgs:     43.750% pgs not active
+ W7 O: ~6 X! u5 K             144/702 objects degraded (20.513%)3 ^) g9 W) N1 V7 \: i' T) N1 _* K
             324/702 objects misplaced (46.154%)1 N5 ~  \9 Y8 g, `1 {
             162 active+clean+remapped
$ Y0 N/ o5 o+ Q+ S             123 undersized+peered% I& H, m7 W" v. ?/ v
             3   undersized+degraded+peered4 Z. u& s6 a+ f& N' H

# j7 B' C+ }8 x1 p1 `6 Q' [查看
& S5 q' x7 p! D; q  c2 X1 I[root@ceph1 ~]# ceph health detail
+ E2 u# }$ G- h1 \HEALTH_WARN 1 MDSs report slow metadata IOs; 324/702 objects misplaced (46.154%); Reduced data availability: 126 pgs inactive; Degraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersized+ i' o$ X, D6 ~, C: r
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
/ t- N, K- C5 e( W: e: r    mdsceph1(mds.0): 9 slow metadata IOs are blocked > 30 secs, oldest blocked for 42075 secs8 W( b  c9 I/ Q) p: l/ S( d
OBJECT_MISPLACED 324/702 objects misplaced (46.154%)6 R" i1 \# [: ^- S2 c/ N7 b$ ]
PG_AVAILABILITY Reduced data availability: 126 pgs inactive
3 G5 ?6 P8 u' Q% q" j    pg 8.28 is stuck inactive for 42240.369934, current state undersized+peered, last acting [0]
% z  g9 u5 S$ T/ k; g7 F, v" F# O+ p    pg 8.2a is stuck inactive for 45566.934835, current state undersized+peered, last acting [0]
0 [! T# r( n! T/ E    pg 8.2d is stuck inactive for 42240.371314, current state undersized+peered, last acting [0]4 ?4 W* Y; X9 W$ ]. Q% K
    pg 8.2f is stuck inactive for 45566.913284, current state undersized+peered, last acting [0]
. ?2 g8 ?4 |0 b  M: [- w2 g& x    pg 8.32 is stuck inactive for 42240.354304, current state undersized+peered, last acting [0]/ \0 ~7 f8 ?, h6 I* Y
    ....
$ w+ w, X4 W# N) I0 n5 e! V, B    pg 8.28 is stuck undersized for 42065.616897, current state undersized+peered, last acting [0]
: K. ?, t+ v: a7 q8 Q$ H    pg 8.2a is stuck undersized for 42065.613246, current state undersized+peered, last acting [0]
: B, v0 @( ?, E* ]% _& k    pg 8.2d is stuck undersized for 42065.951760, current state undersized+peered, last acting [0]' w9 B" V+ Q4 ?, z. B* _
    pg 8.2f is stuck undersized for 42065.610464, current state undersized+peered, last acting [0]8 i5 ]; R. @# ^" ^( ^
    pg 8.32 is stuck undersized for 42065.959081, current state undersized+peered, last acting [0]
7 |3 E9 E) r3 |6 b) x% i    ....
3 Z* i& K/ }5 v; M7 d, Q& @( _& X; O0 n; ]/ |$ p
可见在数据修复中, 出现了inactive和undersized的值, 则是不正常的现象
# A! ]: q" F) x4 n6 Q' J解决方法:
% j( y8 `; D0 `3 r  N2 l$ X3 Q①处理inactive的pg:
) P  v% u% o" @重启一下osd服务即可2 l+ c* ^$ l9 ^. K) ?2 L
[root@ceph1 ~]# systemctl restart ceph-osd.target
4 L, z4 G+ g- B1 u  X18 s) ^- q# z4 d. _
继续查看集群状态发现,inactive值的pg已经恢复正常,此时还剩undersized的pg。
& x% E3 Z# u4 ]" V' d[root@ceph1 ~]# ceph -s' ]! O, p$ [# z) j7 y$ Q
  cluster:, E6 k- V! H$ Q6 z. z
    id:     13430f9a-ce0d-4d17-a215-272890f47f28
0 i( j5 O' O! e, x( w/ I    health: HEALTH_WARN
1 L; H4 I' W  N/ ]            1 filesystem is degraded; t, h) o+ ^; S. d
            241/723 objects misplaced (33.333%)6 t5 A8 f" Q: Y
            Degraded data redundancy: 59 pgs undersized
) _! N' @$ `5 b" v% q
: v1 J0 B4 V# J! `6 C* f, V# n9 Z  services:
4 A% z; J# M/ @; E' p4 b    mon: 3 daemons, quorum ceph2,ceph1,ceph3
- j0 u  X) K. K* t0 t, I    mgr: ceph1(active), standbys: ceph2, ceph3
1 N% Z4 {) [: j9 T0 D* }8 g    mds: cephfs-1/1/1 up  {0=ceph1=up:rejoin}3 X' v: v  O. m' |' N  T. e  o1 B
    osd: 3 osds: 3 up, 3 in; 229 remapped pgs
2 Y1 N; o. A0 l, j  ~  D6 b8 O    rgw: 1 daemon active) [1 B4 v+ f( ]

' K) j0 W% C1 b1 _  data:
$ \* f! Q, w6 T) \7 z9 f$ W    pools:   8 pools, 288 pgs. d# F4 h8 ?( x+ W6 z6 K* R
    objects: 241  objects, 3.4 KiB, I$ Q$ Z& _! E6 r" ?1 S8 b, `
    usage:   3.0 GiB used, 245 GiB / 248 GiB avail
; y1 O) N$ v4 Z4 E; l. x  |! g    pgs:     241/723 objects misplaced (33.333%)% l, U0 e  F" U0 i2 j/ S) U( L
             224 active+clean+remapped
0 V. ?- c2 S' I/ F# M0 C7 F+ i             59  active+undersized8 [% m: c9 G. Q. c$ z% z
             5   active+clean' P/ V' t; @, K' }+ ?5 T9 D3 x
- f' y3 X" V7 G. `" [4 C8 g
  io:0 _3 c, ~* O% i
    client:   1.2 KiB/s rd, 1 op/s rd, 0 op/s wr
* ]6 U/ a% p) M4 z2 a0 i( |2 |; }, X* |
②处理undersized的pg:
" {. Y5 m# `; g7 {# {% x8 t
: x- B% i+ Z/ A& s7 |( X9 E学会出问题先查看健康状态细节,仔细分析发现虽然设定的备份数量是3,但是PG 12.x却只有两个拷贝,分别存放在OSD 0~2的某两个上。
/ f& S% [% O9 h. h. t[root@ceph1 ~]# ceph health detail
( w% |3 K; H' \$ \& s" v& p. w& UHEALTH_WARN 241/723 objects misplaced (33.333%); Degraded data redundancy: 59 pgs undersized
1 \1 V1 `1 C( G0 Z7 A# q4 kOBJECT_MISPLACED 241/723 objects misplaced (33.333%)6 X7 n: o3 F* U" H
PG_DEGRADED Degraded data redundancy: 59 pgs undersized" [' Z1 ~; z' n' r
    pg 12.8 is stuck undersized for 1910.001993, current state active+undersized, last acting [2,0]
/ Y4 E" f8 x6 W0 Q/ c: v    pg 12.9 is stuck undersized for 1909.989334, current state active+undersized, last acting [2,0]
7 h& Q; I* d) S4 o" n: G" O    pg 12.a is stuck undersized for 1909.995807, current state active+undersized, last acting [0,2]% O9 r2 g: S. S( ]$ V& T9 z
    pg 12.b is stuck undersized for 1910.009596, current state active+undersized, last acting [1,0]$ ]6 K2 _- w2 R# t
    pg 12.c is stuck undersized for 1910.010185, current state active+undersized, last acting [0,2]+ ?% ]3 r9 j- v2 l' f7 g8 e/ u
    pg 12.d is stuck undersized for 1910.001526, current state active+undersized, last acting [1,0]- P8 C& m$ h2 E
    pg 12.e is stuck undersized for 1909.984982, current state active+undersized, last acting [2,0]( A% Y1 Y1 _4 W7 g3 f
    pg 12.f is stuck undersized for 1910.010640, current state active+undersized, last acting [2,0]
( s1 D$ |- _% A1 v6 H* |
. Q4 @2 R7 f" U- c进一步查看集群osd状态树,发现ceph2和cepn3宕机再恢复后,osd.1 和osd.2进程已不在ceph2和cepn3上。* i+ j* ]6 W" k7 O7 Z1 N3 A! T
[root@ceph1 ~]# ceph osd tree' ~; n, u) C! f% n3 l* {
ID CLASS WEIGHT  TYPE NAME               STATUS REWEIGHT PRI-AFF
: A: D6 |+ C2 n3 b( i-1       0.24239 root default                                    . Z; {0 _5 Y9 Z
-9       0.16159     host centos7evcloud                        
- r* R' m8 Z  |3 g 1   hdd 0.08080         osd.1               up  1.00000 1.00000 # p4 M& n$ h" t. W) u# ]+ \: @
2   hdd 0.08080         osd.2               up  1.00000 1.00000
1 l; K3 w  P5 u  ]0 k' P-3       0.08080     host ceph1                                  9 ?! R/ _- s3 t8 l
0   hdd 0.08080         osd.0               up  1.00000 1.00000 2 V+ v' o+ L& }. {7 K9 E# g! {- ^
-5             0     host ceph2                                  + B. \% p- m) [2 f4 ]2 a
-7             0     host ceph3
$ W: q( C: P. A+ U3 w
. w: ~# C4 i" w! F$ j1 h分别查看osd.1 和osd.2服务状态。
5 [* T5 O! e4 R. f+ U. j) Z1 E4 }! d1 m, B
解决方法:5 y, e. d( b: l- [, T, X) W
分别进入到ceph2和ceph3节点中重启osd.1 和osd.2服务,将这两个服务重新映射到ceph2和ceph3节点中。
& G$ Q, z3 Q: d[root@ceph1 ~]# ssh ceph2
% Y( c# ^: b) G3 i[root@ceph2 ~]# systemctl restart ceph-osd@1.service0 @1 v9 F, X4 C, Y
[root@ceph2 ~]# ssh ceph3
% _* |" M* g2 C& f/ F9 S( K9 `9 T" s7 O[root@ceph3 ~]# systemctl restart ceph-osd@2.service( W: c% |6 a- G

  T3 c3 `% M  `* f1 P8 F8 z最后查看集群osd状态树发现这两个服务重新映射到ceph2和ceph3节点中。
+ O, M1 U! L' ~4 ~, b2 I8 m[root@ceph3 ~]# ceph osd tree
$ \# f8 q$ Z- x, `  cID CLASS WEIGHT  TYPE NAME               STATUS REWEIGHT PRI-AFF - q0 M5 Y+ y3 ~' K' O) }
-1       0.24239 root default                                    , p! Q; a: C, W; Y+ T
-9             0     host centos7evcloud                        
( F' \8 L6 k3 ~-3       0.08080     host ceph1                                  ( k, F! f8 y. L( D& r/ s% G; Q
0   hdd 0.08080         osd.0               up  1.00000 1.00000 ( T( o' M7 @0 n8 q& `
-5       0.08080     host ceph2                                  3 P! M4 F2 W2 l* v
1   hdd 0.08080         osd.1               up  1.00000 1.00000
) K+ m! a; w/ U: \# Y; }-7       0.08080     host ceph3                                 
1 ]( z0 X1 @5 q$ m' W& v; d1 k, B 2   hdd 0.08080         osd.2               up  1.00000 1.00000  V# Q1 R1 F/ u9 C  R" q- s( \

" V( P8 J" ]4 p% `( j- q集群状态也显示了久违的HEALTH_OK。" Z9 v* G1 y" W) x7 b0 H
[root@ceph3 ~]# ceph -s* s/ ~6 Z/ f. ]2 x3 R) [( D
  cluster:
0 U. R3 m% @/ C% _3 @    id:     13430f9a-ce0d-4d17-a215-272890f47f28
. V4 K" C+ ]+ L+ I    health: HEALTH_OK0 x2 H8 F2 _( b3 w
* @. t2 w  ]6 q. ?
  services:: Q! P7 E" e& [7 c% O
    mon: 3 daemons, quorum ceph2,ceph1,ceph34 q$ n& E* a# L0 F7 R, Z) z
    mgr: ceph1(active), standbys: ceph2, ceph3
  K$ B% f2 @1 w    mds: cephfs-1/1/1 up  {0=ceph1=up:active}
, c" Y6 P/ L  U: }    osd: 3 osds: 3 up, 3 in6 _$ S5 Q5 E) X% B+ _5 [# O
    rgw: 1 daemon active
+ y* ^) K5 L9 ]5 h " M/ ~2 n1 U! b
  data:) E; V3 L  }6 G. V/ U7 t
    pools:   8 pools, 288 pgs/ R# l. X7 C( G( D3 @
    objects: 241  objects, 3.6 KiB
' w  y8 A1 r, v  R8 C8 K! b    usage:   3.1 GiB used, 245 GiB / 248 GiB avail+ i# e5 u& Y! W+ }
    pgs:     288 active+clean
6 {8 Z  A& B7 m* V4 t/ \+ R/ c  G5 a2 m2 a
6.卸载CephFS后再挂载时报错5 {9 N( T: f; k
挂载命令如下:- Z* m* D. [! ~+ ?: g
mount -t ceph 10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ /mnt/mycephfs/ -o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==
& z! b9 M9 `5 @. Y+ h& N9 t
# x9 g8 R# I6 V- U5 y9 {卸载CephFS后再挂载时报错:mount error(2): No such file or directory
* E( i' f, z0 F( T说明:首先检查/mnt/mycephfs/目录是否存在并可访问,我的是存在的但依然报错No such file or directory。但是我重启了一下osd服务意外好了,可以正常挂载CephFS。
/ K5 @1 P2 H# w4 u4 |6 A[root@ceph1 ~]# systemctl restart ceph-osd.target
3 C" o1 Z" N& T! o7 \! k3 g[root@ceph1 ~]# mount -t ceph 10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ /mnt/mycephfs/ -o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==
3 P8 X+ g! {1 N4 s% Y
# h: o0 S$ T0 ?1 w可见挂载成功~!  D. J6 v. j: `  `$ b7 ^
[root@ceph1 ~]# df -h
* z" I9 F8 B' L6 i. l7 bFilesystem                                            Size  Used Avail Use% Mounted on
( m# v& F) L' T6 B; D6 i/ Z' C/dev/vda2                                              48G  7.5G   41G  16% /
0 q5 o6 R0 T: T  u& q# _& F+ N6 Udevtmpfs                                              1.9G     0  1.9G   0% /dev6 c! b7 m; U, J+ Z/ r; I
tmpfs                                                 2.0G  8.0K  2.0G   1% /dev/shm
/ q/ W9 q/ q) [7 [tmpfs                                                 2.0G   17M  2.0G   1% /run8 G/ Y: b6 L# U* [: t0 i& }
tmpfs                                                 2.0G     0  2.0G   0% /sys/fs/cgroup/ ^$ x0 a. L( `1 C% C4 b9 i8 J
tmpfs                                                 2.0G   24K  2.0G   1% /var/lib/ceph/osd/ceph-04 J$ W4 N% f; h) w  _9 Z6 p9 P
tmpfs                                                 396M     0  396M   0% /run/user/0
8 c* X' k" e7 @# x" @# B10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/  249G  3.1G  246G   2% /mnt/mycephfs1 k8 K" l7 f  ]$ J
- Q1 \- y0 I" a& m2 o9 @- l
积累中。。。& r, Q, s3 G0 Y$ C2 n
=========================================================================
8 X  N4 T& [* t* U% g) k6 d总结:* R. _- Y/ R2 w' g5 E3 T) M  ]
查看集群状态发现报错或警告后,往往通过ceph health detail命令可以查看到系统给出的处理建议。通过这些建议一般可以处理大多数集群出现的问题。" q9 o/ ^. H- m% {# @8 g! b3 |

" @" G! E2 h  f! d: F% A
您需要登录后才可以回帖 登录 | 开始注册

本版积分规则

关闭

站长推荐上一条 /4 下一条

如有购买积分卡请联系497906712

QQ|返回首页|Archiver|手机版|小黑屋|易陆发现 点击这里给我发消息

GMT+8, 2021-9-27 03:23 , Processed in 0.050636 second(s), 23 queries .

Powered by LR.LINUX.cloud bbs168x X3.2 Licensed

© 2012-2022 Comsenz Inc.

快速回复 返回顶部 返回列表