Winbind can no more connect to Windows domain after reload

Bug #1960821 reported by Norbert B.
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
samba (Ubuntu)
New
Undecided
Unassigned

Bug Description

Problem description:

After a reload winbind can no more connect to Windows domain and slows down other services on the system (maybe only authentication services).
It happened for the first time on January 13, 2022
Then it happend mostly once per week, so we found out, that it came up with log rotation and reload of winbind.
It is reproducable on our systems with:
/usr/bin/smbcontrol winbindd reload-config
(as it is done in logrotate).

Effect:

1. Winbind loses Windows domain connection, starts to log:
[2022/02/14 11:00:13.872687, 1] ../source3/winbindd/winbindd_cm.c:1258(cm_prepare_connection)
  Failed to prepare SMB connection to DC2-CHILD1.child1.parent.cloud: NT_STATUS_IO_TIMEOUT
[2022/02/14 11:00:33.147954, 1] ../source3/winbindd/winbindd_cm.c:1229(cm_prepare_connection)
  failed tcon_X with NT_STATUS_IO_TIMEOUT

2. Side effect:
- SSH authentication is very slow (SSH login needs minutes or fails)
- SFTP connections run in timeout
- Other services (like Apache) slow down or are not reachable (timeout)

3. The problem disappears after restart of winbind, but in this case the restart takes very long time:
time systemctl restart winbind
real 1m30.285s

Currently we have a workaround in /etc/logrotate.d/winbind:
#/usr/bin/smbcontrol winbindd reload-config
/bin/systemctl restart winbind

Operating System: Ubuntu 18.04.6 LTS
Kernel: Linux 5.4.0-1063-oracle
Samba: Version 4.7.6-Ubuntu 2:4.7.6+dfsg~ubuntu-0ubuntu2.28
(The problem happened in 4.7.6+dfsg~ubuntu-0ubuntu2.27 too)

Samba config (relevant parts):

[global]
   workgroup = PARENT
   security = ADS
   realm = PARENT.CLOUD

   idmap config * : backend = tdb
   idmap config * : range = 3000-99999

   idmap config PARENT : backend = rid
   idmap config PARENT : range = 100000-199999

   idmap config CHILD1 : backend = rid
   idmap config CHILD1 : range = 200000-299999

   idmap config CHILD2 : backend = rid
   idmap config CHILD2 : range = 300000-399999

   idmap config CHILD3 : backend = rid
   idmap config CHILD3 : range = 400000-499999

   min domain uid = 0
   username map = /etc/samba/user.map

   winbind refresh tickets = Yes
   vfs objects = acl_xattr
   map acl inherit = Yes
   store dos attributes = Yes

   dedicated keytab file = /etc/krb5.keytab
   kerberos method = secrets and keytab

CVE References

Revision history for this message
Norbert B. (nbpq) wrote :

Additional:
Disabling "min domain uid = 0" does not solve the problem.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

It will be difficult to try to reproduce this problem here, with multiple domains interconnected like that.

Did you start experiencing this problem recently, after some update, or has it always been like this?

I see this remark in the winbindd manpage:
       SIGHUP
           Reload the smb.conf(5) file and apply any parameter changes to the running version
           of winbindd. This signal also clears any cached user and group information. The
           list of other domains trusted by winbindd is also reloaded.

That's what is used for log rotation. I suspect in your case that cache that is erased plays a big role in winbindd's performance. Note how it also reconnects to all trusted domains after this signal.

Have you tried setting winbind max domain connections to a value higher than 1?

Does winbind ever recover itself from the reload effects after a while, or only after a restart?

Utkarsh Gupta (utkarsh)
Changed in samba (Ubuntu):
status: New → Incomplete
Revision history for this message
Norbert B. (nbpq) wrote (last edit ):

Hello Andreas,

the problem came up the first time on 2022-01-16 and is present till now.
Winbind updates before were:

2021-05-01 06:23:05 upgrade winbind:amd64 2:4.7.6+dfsg~ubuntu-0ubuntu2.21 2:4.7.6+dfsg~ubuntu-0ubuntu2.23
2021-12-08 07:00:42 upgrade winbind:amd64 2:4.7.6+dfsg~ubuntu-0ubuntu2.23 2:4.7.6+dfsg~ubuntu-0ubuntu2.26
2021-12-14 06:59:55 upgrade winbind:amd64 2:4.7.6+dfsg~ubuntu-0ubuntu2.26 2:4.7.6+dfsg~ubuntu-0ubuntu2.27
2022-02-02 06:01:10 upgrade winbind:amd64 2:4.7.6+dfsg~ubuntu-0ubuntu2.27 2:4.7.6+dfsg~ubuntu-0ubuntu2.28

In 2.23 it was definitely not present.
In 2.26 there has been one log rotation with reload without problems.
In 2.27 there were some log rotations without problem, but then it started on January, 16

On Jan, 13 we had lots of upgrades.
Could it be caused by another upgraded package?

winbind max domain connections = 10
didn't solve the problem.

Winbind does not recover for hours.
We restarted it at the first occurrence after 2-3 hours with hanging services.

Changed in samba (Ubuntu):
status: Incomplete → New
Revision history for this message
Norbert B. (nbpq) wrote (last edit ):

Meanwhile I did a test with only one domain (parent domain):
The problem stays, there's no difference to configuration with multiple domains.

Revision history for this message
Norbert B. (nbpq) wrote :

For further tests I could downgrade samba to 2:4.7.6+dfsg~ubuntu-0ubuntu2.23, but this package is no more present in Oracle repositories.
Is there a backport for these packages?

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

You can find the binaries of samba/2:4.7.6+dfsg~ubuntu-0ubuntu2.23 here (clicking on the architecture you are using):

https://launchpad.net/ubuntu/+source/samba/2:4.7.6+dfsg~ubuntu-0ubuntu2.23

And you could download and install them if you want.

@Andreas do you think this is something reproducible in your side? @Norbert mentioned the issue is still present with a single domain.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Maybe with a big domain. How many users and groups are we talking about? ballpark

Revision history for this message
Norbert B. (nbpq) wrote :

Over all < 500 Users (300-400)

Revision history for this message
Norbert B. (nbpq) wrote :

...and it worked with same users till January

Revision history for this message
Norbert B. (nbpq) wrote :

Hello Lucas,

I don't really want to compile an older Samba version or install multiple single binaries, because I'm testing on our second production fallback node and we need it in productive and stable condition at any time.

So do you know any backport repositories with these versions?
With this solution I could downgrade and later upgrade again without the risk of destroying my productive environment.

(Why doesn't Oracle keep older versions in their repositories?)

Revision history for this message
Lena Voytek (lvoytek) wrote :

Hello,

Here is a PPA for version 2:4.7.6+dfsg~ubuntu-0ubuntu2.23 that will allow you to test the previous version and upgrade again later. It should be easier than downloading and installing manually.

https://launchpad.net/~sergiodj/+archive/ubuntu/winbind-bug1960821

Revision history for this message
Norbert B. (nbpq) wrote :

Hello Lena,

sorry for the late reply but I was on vacation last week.

Thanks for the repository - very nice of you.
Curently it's not working because of some missing dependencies.
Is it much work, to put the missing modules to the testing repository?

apt install samba=2:4.7.6+dfsg~ubuntu-0ubuntu2.23test~ppa1 samba-common=2:4.7.6+dfsg~ubuntu-0ubuntu2.23test~ppa1 samba-common-bin=2:4.7.6+dfsg~ubuntu-0ubuntu2.23test~ppa1 samba-libs=2:4.7.6+dfsg~ubuntu-0ubuntu2.23test~ppa1

The following packages have unmet dependencies:
 samba : Depends: python-samba but it is not going to be installed
         Recommends: samba-dsdb-modules but it is not going to be installed
         Recommends: samba-vfs-modules but it is not going to be installed
 samba-common-bin : Depends: python-samba but it is not going to be installed
 samba-libs : Depends: libwbclient0 (= 2:4.7.6+dfsg~ubuntu-0ubuntu2.23test~ppa1) but 2:4.7.6+dfsg~ubuntu-0ubuntu2.28 is to be installed

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Hi Norbert,

All the unmet dependencies you got are already available in the PPA. To be in the safe side, you might want to specify all of them, one by one, specifying the version in the PPA. This likely happened because you need all the dependencies locked to the same version, since you did not list them as the others, they did not match. Could you try again pulling all those packages from the PPA?

Revision history for this message
Norbert B. (nbpq) wrote :

Hello Lucas,

thank you for the detailed description - it did work now.

Result:
In 2:4.7.6+dfsg~ubuntu-0ubuntu2.23test~ppa1 the error is not present or at least currently not triggerable. I think it is not present, because it began with the later versions.

So I upgraded again back to 2:4.7.6+dfsg~ubuntu-0ubuntu2.28.

Now it becomes strange: Currently (2.28) the error ist not triggerable too.
Before the down- and upgrade it war triggerable with every /usr/bin/smbcontrol winbindd reload-config.
But: In the past the error didn't happen on every logrotate calling /usr/bin/smbcontrol winbindd reload-config too, but on some.

So I can only wait if or when it happens again in 2.28.
Logrotate runs on each Sunday.
Between 2.23 and 2.28 there was the big security fix CVE-2020-25717.
Can there be a reason for this behaviour?

I'm sorry, that I can't give you an expicit result.

Revision history for this message
Paride Legovini (paride) wrote :

Hello Norbert and thanks for the follow-up. I think it's worth checking that there are no leftover packages from the PPA. This shouldn't happen as the samba packages are tied to have the same version, but I think it's worth verifying. This can be done by:

  dpkg -l | grep ppa1

given that the packages from the PPA all have "ppa1" in their version string.

I'm marking this bug report as Incomplete to reflect that the issue is not well understood and we're waiting for more information about it.

Changed in samba (Ubuntu):
status: New → Incomplete
Revision history for this message
Norbert B. (nbpq) wrote :

Hello Paride,

there's no package from 2.23 left, all is 2.28 again.

Meanwhile the issue is present again and behaviour is same as in first problem descrition.
I did the workaround in /etc/logrotate.d/winbind again.

What else could I do to help you finding the problem?

Norbert B. (nbpq)
Changed in samba (Ubuntu):
status: Incomplete → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.