本文从探索了TKEStack 1.6中global集群任何serviceaccount均具有cluster-admin权限原因。

TKEStack v1.6.0已经发布了,没有包含重大更新,但是在使用过程中,我们发现了一个很神奇的现象:global集群中任何serviceaccount都能访问所有的集群资源。这点可以直接使用kubectl auth can-i得到验证:

在1.5.0集群中执行:

$ kubectl auth can-i create pod --as=system:serviceaccount:tke:fake
no

而在1.6.0中执行:

$ kubectl auth can-i create pod --as=system:serviceaccount:tke:fake
yes

我们甚至都还没有创建过fake这个serviceaccount。根据现象,首先怀疑是kube-apiserver的authz配置发生了改变,查看kube-apiserver的配置,果然:

--authorization-mode=Node,RBAC,Webhook

1.5.0中则只有Node,RBAC,看来问题就出在Webhook中。关于Webhook的说明,官方文档给出了解释。继续查看kubernetes的Webhook配置,配置文件为--authorization-webhook-config-file=/etc/kubernetes/tke-authz-webhook.yaml,文件内容如下:

  tke-authz-webhook.yaml: |
    apiVersion: v1
    kind: Config
    clusters:
      - name: tke
        cluster:
          certificate-authority: /app/certs/ca.crt
          server: http://{vip}:31138/auth/authz
    users:
      - name: admin-cert
        user:
          client-certificate: /app/certs/admin.crt
          client-key: /app/certs/admin.key
    current-context: tke
    contexts:
    - context:
        cluster: tke
        user: admin-cert
      name: tke    

webhook代理地址为http://{vip}:31138/auth/authz,也即是tke-auth-api的nodeport。

继续查看tke-auth-api中认证相关配置: /cmd/tke-auth-api/app/app.go

cfg, err := config.CreateConfigFromOptions(basename, opts)

tke-auth-api/app/config/config.go

aggregateAuthz, err := aggregation.NewAuthorizer(authClient, opts.Authorization, opts.Auth, enforcer, opts.Authentication.PrivilegedUsername)

/opt/project/tke/pkg/auth/authorization/aggregation/aggregation.go

// NewAuthorizer creates a authorizer for subject access review and returns it.
func NewAuthorizer(authClient authinternalclient.AuthInterface, authorizationOpts *options.AuthorizationOptions, authOpts *options.AuthOptions, enforcer *casbin.SyncedEnforcer, privilegedUsername string) (authorizer.Authorizer, error) {
	var (
		authorizers []authorizer.Authorizer
	)

	if len(authorizationOpts.WebhookConfigFile) != 0 {
		webhookAuthorizer, err := webhook.New(authorizationOpts.WebhookConfigFile,
			authorizationOpts.WebhookVersion,
			authorizationOpts.WebhookCacheAuthorizedTTL,
			authorizationOpts.WebhookCacheUnauthorizedTTL, nil)
		if err != nil {
			return nil, err
		}

		authorizers = append(authorizers, webhookAuthorizer)
	}

	if len(authorizationOpts.PolicyFile) != 0 {
		abacAuthorizer, err := abac.NewABACAuthorizer(authorizationOpts.PolicyFile)
		if err != nil {
			return nil, err
		}
		authorizers = append(authorizers, abacAuthorizer)
	}

	authorizers = append(authorizers, local.NewAuthorizer(authClient, enforcer, privilegedUsername))

	return union.New(authorizers...), nil
}

可以看到最终的authrizer配置由webhook(如果有)、abac(如果有)和local组成。auth-api的配置文件:

  tke-auth-api.toml: |
    ........
    [authorization]
    policy_file="/app/conf/abac-policy.json"

继续查看abac-policy.json

{"apiVersion":"abac.authorization.kubernetes.io/v1beta1","kind":"Policy","spec":{"user":"system:*","namespace":"*", "resource":"*","apiGroup":"*", "group": "*", "nonResourcePath":"*"}}

该文件将配置任意system:*配置拥有任意namespace下的所有资源。

至此,问题原因已经找到了。但问什么TKEStack中如此配置ABAC?这将导致一个明显的漏洞出现。带着问题,我们继续查看github上的修改提交记录: