libnginx-mod-nchan distributes old/incompatible module

Bug #1874831 reported by Damjan Georgievski
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
nginx (Ubuntu)
Fix Released
Medium
Unassigned
Focal
Incomplete
Medium
Unassigned
Groovy
Won't Fix
Medium
Unassigned

Bug Description

The libnginx-mod-nchan package which is built from the nginx source package distributes an old nchan module. The nginx source package has bundled nchan 1.0.8 which has known issues when running with nginx >= 1.13.10

The upstream issue documents that nginx has had some changes that break nchan < v1.2.0
https://github.com/slact/nchan/issues/460

Suggestion, update nchan to >= v1.2.0 (best v1.2.7).

this is the error I get when using the `nchan_publisher_upstream_request` directive:

2020/04/24 01:00:35 [error] 9#9: *6 nchan: unexpected publisher message request body buffer location. please report this to the nchan developers. while sending to client, client: 127.0.0.1, server: _, request: "POST /publish?topic=one HTTP/1.1", subrequest: "/authorize_pub", upstream: "http://127.0.0.1:5000/authorize_pub", host: "127.0.0.1"

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: libnginx-mod-nchan 1.17.10-0ubuntu1
Uname: Linux 5.6.6-arch1-1 x86_64
ApportVersion: 2.20.11-0ubuntu27
Architecture: amd64
CasperMD5CheckResult: skip
Date: Fri Apr 24 16:53:56 2020
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
SourcePackage: nginx
UpgradeStatus: No upgrade log present (probably fresh install)

Related branches

Revision history for this message
Damjan Georgievski (gdamjan) wrote :
Changed in nginx (Ubuntu):
status: New → Triaged
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Thanks for reporting this...

I have opened an issue upstream (Debian):

https://salsa.debian.org/nginx-team/nginx/-/issues/2

Let's see what maintainers say and then we check how to move further.

Changed in nginx (Ubuntu):
importance: Undecided → Medium
Changed in nginx (Ubuntu Focal):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Thomas Ward (teward) wrote :

We are sufficiently diverged... I would have to evaluate any module changes in terms of features for an SRU but I can queue an updated module in the packaging for Ubuntu as well.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Thanks a lot for checking this @teward! Your call, I'd say only if you think its worth the time (versus benefit). Up to you! This will be in server's team queue until its incomplete/closed. Cheers!

Revision history for this message
Thomas Ward (teward) wrote :

rafaeldtinoco: I've added this to my queue for Groovy, which means I just have more package changes to implement heh.

Groovy set to In Progress for now as i'm already working on that packaging.

Changed in nginx (Ubuntu Groovy):
assignee: nobody → Thomas Ward (teward)
status: Triaged → In Progress
Revision history for this message
Thomas Ward (teward) wrote :

Hello.

There is a SUBSTANTIAL delta between the way things're packaged wrt nchan. As such, I've had to make minor alterations to the nchan source (removal of some binary files). AN updated nginx for Groovy has been uploaded to a PPA - https://launchpad.net/~teward/+archive/ubuntu/nginx-test - and soon a Focal upload will land there.

**Please test to make sure nchan works with these changes as intended.** If it does not, then this will need additional effort to make fully functional.

Revision history for this message
Damjan Georgievski (gdamjan) wrote :

I guess the Focal packages didn't appear yet? I'll try to test with the groovy package on top of Focal docker.

Revision history for this message
Damjan Georgievski (gdamjan) wrote :

I've tested my scenario and it works fine.

Just for completeness, tested on ubuntu focal 20.04 (amd64) docker image +
libnginx-mod-nchan 1.18.0-0ubuntu2 amd64
nginx-light 1.18.0-0ubuntu2 amd64

nchan is version 1.2.7

rhe configuration snippet used is:

location /publish {
  nchan_publisher;
  nchan_channel_id $arg_topic;
  nchan_publisher_upstream_request /authorize_pub;
}

Revision history for this message
Damjan Georgievski (gdamjan) wrote :

I see that ubuntu focal (focal-updates) got an libnginx-mod-nchan (1.18.0-0ubuntu1) package, but that one is still based on nchan 1.0.8 which crashes on publish.

Revision history for this message
Thomas Ward (teward) wrote :

I had to wait for this merge to be complete before I could consider moving forward.

Given that the issues filed on the Salsa page may not be present, I will contact the Debian maintainer directly and see if they're aware of this issue. If they can fix it in Debian and make it available, we can just merge from Debian and get the fix.

Changed in nginx (Ubuntu Groovy):
status: In Progress → Triaged
Revision history for this message
Thomas Ward (teward) wrote :
Revision history for this message
Thomas Ward (teward) wrote :

This was *supposedly* fixed in 1.18.0-1 in Debian - how this didn't make it into the merge I have no idea.

Changed in nginx (Ubuntu Groovy):
status: Triaged → In Progress
Revision history for this message
Bryce Harrington (bryce) wrote :

Hi Thomas, could you elaborate? Is there some delta we should apply for the ubuntu package?

Revision history for this message
Thomas Ward (teward) wrote :

I was digging and in Groovy the version works fine.

The issue will be Focal, we'll have to SRU the fix for that module only.

Thomas Ward (teward)
Changed in nginx (Ubuntu):
assignee: Thomas Ward (teward) → nobody
Changed in nginx (Ubuntu Groovy):
assignee: Thomas Ward (teward) → nobody
status: In Progress → Triaged
Changed in nginx (Ubuntu):
status: In Progress → Triaged
Revision history for this message
Brian Murray (brian-murray) wrote :

The Groovy Gorilla has reached end of life, so this bug will not be fixed for that release

Changed in nginx (Ubuntu Groovy):
status: Triaged → Won't Fix
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi - while trying to clear old forgotten bugs it seems the SRU of this was missed in the past.

TL;DR: the new nginx is incompatible with the integrated nchan <=1.2.0
This fixes it and is in new releases
https://salsa.debian.org/nginx-team/nginx/-/commit/b575cfe355931e42fb61febe58d06076a8b10725

But the same should be tried to go to Focal.

OTOH Focal had a major bump already:
 nginx | 1.17.10-0ubuntu1 | focal | source, all
 nginx | 1.18.0-0ubuntu1.2 | focal-security | source, all
 nginx | 1.18.0-0ubuntu1.2 | focal-updates | source, all

This also did not yet add that.

I must admit - I might not know what I'm doing here, but the patch applies fine and would bump the nchan module to a compatible version.
And the patching/building seems easy enough so let me try.

PPA:
https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4678/+packages

MP:
https://code.launchpad.net/~paelzer/ubuntu/+source/nginx/+git/nginx/+merge/409828

@Damjan - I know it has been a while, but could you give this Focal-PPA a check if it would resolve the issue please? Also for an SRU we will need "steps to reproduce" if you could outline them for an nginx-newbie like me that would be helpful. I assume it is "install nginx + enable nchan module" or is there more to it?

Now if that would have any negative implications or SRU considerations I'm not sure.
Could it have happened that nchan in some config worked and now changes behavior (breaking SRU) or is it juts "module didn't work in Focal ever, so no one can be affected?
Teward can you answer that?

tags: added: server-todo
Changed in nginx (Ubuntu Focal):
assignee: nobody → Christian Ehrhardt  (paelzer)
Changed in nginx (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I found a rather similar case (bug 1893753) which changes things slightly.
The PPA stays the same, but the MP now is at:
https://code.launchpad.net/~paelzer/ubuntu/+source/nginx/+git/nginx/+merge/409830

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - things were not as easy and build fails block the PPA for now.
I'll let you know once I found some time to resolve that.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Build fixed (as all about lua, nchan was fine). Could affected people with real use-cases please give the build in the PPA [1] a try and report if it works for them?

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4678

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
there was no feedback yet, did anyone have a chance to try this build and outline steps to test this as needed for the SRU process [1]?

I tried the following (probably too trivial) and it worked just fine even without the fix.

$ apt install libnginx-mod-http-lua nginx-core
For nchan add in /etc/nginx/sites-enabled/default
    location = /sub {
      nchan_subscriber;
      nchan_channel_id $arg_id;
    }

    location = /pub {
      nchan_publisher;
      nchan_channel_id $arg_id;
    }
# Restart server
$ sudo systemctl restart nginx
# Access nchan content
$ curl --request POST --data "test message" http://127.0.0.1:80/pub
queued messages: 1
last requested: 0 sec. ago
active subscribers: 0
last message id: 1636616299

Maybe you can help me to extend that example to show the issue?

P.S. there is activity at the sibling bug 1893753. If we only get confirmation and proper tests steps there, but not here - then I'll have to drop this part of the upload before queuing for an SRU to Focal. So your help would really be appreciated.

[1]: https://wiki.ubuntu.com/StableReleaseUpdates#SRU_Bug_Template

Changed in nginx (Ubuntu Focal):
status: Triaged → Incomplete
Revision history for this message
Damjan Georgievski (gdamjan) wrote :

Tested with the packages from https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4678
and it works for me

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Damjan that is great news, any chance to enlight me how to test it myself so that I could summarize that for the SRU Template as test steps?

Revision history for this message
Damjan Georgievski (gdamjan) wrote :

In my original test, the POST to the /pub would crash the request in nginx - not crash the whole process but it would error out, and the curl would get disconnected with no data received.

Just now, before testing your PPA packages, using latest focal packages, the POST to /pub succeeded, but then a GET request to /sub crashed nginx:

nginx: worker process: /build/nginx-KTLRnK/nginx-1.18.0/debian/modules/nchan/src/store/memory/ipc-handlers.c:417: memstore_ipc_send_get_message: Assertion `data.shm_chid->len>1' failed.
2021/11/11 12:12:15 [alert] 915#915: worker process 916 exited on signal 6 (core dumped)

In general, a POST to /pub and a GET on /sub should succeed and return the same message.

I have a full demo with Docker and simple html app here (based on bullseye, since that worked at the time):
https://github.com/gdamjan/pub-sub-demo

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

That simple post+sub also works fine unfortunately:

root@f:~# curl --request POST --data "test message" http://127.0.0.1:80/pub
queued messages: 2
last requested: 6 sec. ago
active subscribers: 0
last message id: 16367017

root@f:~# curl --request GET http://127.0.0.1:80/sub
test message

Using your linked examples (thanks) I got

curl -i --header "Content-Type: application/json" --data '{"message":"test 1"}' http://localhost:80/pub?topic=one
HTTP/1.1 202 Accepted
Server: nginx/1.18.0 (Ubuntu)
Date: Fri, 12 Nov 2021 07:25:06 GMT
Content-Type: text/plain
Content-Length: 101
Connection: keep-alive

queued messages: 3
last requested: 97 sec. ago
active subscribers: 0
last message id: 1636701906:0

root@f:~# curl -i --header 'Accept: text/event-stream' --header "Content-Type: application/json" --get --data-urlencode 'payload={"topics":["one","two"]}' http://localhost:80/sub
HTTP/1.1 200 OK
Server: nginx/1.18.0 (Ubuntu)
Date: Fri, 12 Nov 2021 07:26:09 GMT
Content-Type: text/event-stream; charset=utf-8
Connection: keep-alive

: hi

id: 1636701724:0
data: test message

id: 1636701787:0
data: test message2

id: 1636701906:0
data: {"message":"test 1"}

id: 1636701939:0
data: {"message":"test 2"}

This latter one even works to stay active and further messages posted later arrive here.

Still no sign of the reported breakage.
So it must be part of the setup/config that is done I guess ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I tried to use more of your docker config, but but either worked without a crash or didn't work (but because of the proxy/flask upstream_app being incompletely set up).
Is there a way to trigger this without the app, or do we really need to deploy it (or something similar) to recreate this.

BTW - since you are active (thanks) and have a reproducible way on your side at least and we can "explain" it with your container setup we can try to drive the SRU even with that if we fail to isolate better steps.

But I'd with we get something similar for lua in bug 1893753 ...

Revision history for this message
Damjan Georgievski (gdamjan) wrote :

sorry, just went to my first message, the issue is when also using the nchan_publisher_upstream_request directive (to authorize the publish request).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I was able to recreate the error now.
I managed to trigger that by subscribing after a server restart with no message posted:

$ sudo systemctl restart nginx
$ sudo tail -f /var/log/nginx/error.log
# wait here then in another console
$ curl -i --header 'Accept: text/event-stream' --header "Content-Type: application/json" --get --data-urlencode 'payload={"topics":["one","two"]}' http://localhost:8000/subscribe
# The error log will spit out:
ker process: /build/nginx-KTLRnK/nginx-1.18.0/debian/modules/nchan/src/store/memory/ipc-handlers.c:417: memstore_ipc_send_get_message: Assertion `data.shm_chid->len>1' failed.
2021/11/15 11:09:07 [alert] 5340#5340: worker process 5344 exited on signal 6 (core dumped)

And that is indeed fixed by the PPA.

But BTW if you point nchan_publisher_upstream_request to an invalid target, then even with the fix it does fail the same way. And actually with the new version I had a hard time to get it right. I have found cases which formerly worked (basic nchan config post-pub + get+sub) which now after the fix is installed trigger a crash like this. This is kind of the very definition of what the SRU policy tries to prevent :-/

The argument for "allowing" the full version bump of this embedded module was that without this it isn't usable at all (therefore we can't regress anyone). But as we've seen it seems there are (basic?) configurations which were fine before, therefore I'm not entirely sure we can really push this as an SRU :-/

@teward @racb - would you mind to have a look and comment what you think about it?

Revision history for this message
Robie Basak (racb) wrote :

I think Christian's feeling is right.

SRU policy is to cherry-pick specific fixes. Normally we'd expect a bug report to have identified a root cause in the code, and we'd be patching that root cause to fix the problem.

Identifying that one version works and one version does not, and therefore we should swap the version in a stable release, doesn't comply with this requirement and isn't normally acceptable because there usually exist users not affected by the bug who might find that behaviour for them changes, contrary to their expectations of a stable release. It's also incomplete to just identify version differences like this: there is always going to be a root cause in the code, so it should always be possible to turn this case into the first case above and therefore to find a way to update the package with which we can be more confident doesn't disrupt existing unaffected users.

Exceptionally, if we can be certain that there aren't any existing unaffected users, then we can bump the version safely. Or if it's not practical to do the above step, we might decide that the "least worst" solution is to take that risk. But then I'd expect the case to be made that there are no existing unaffected users and therefore there can be no disruption by the version bump. Or an explanation of why it is not practical to turn this into the first case and risking regression to existing users is "least worst".

Christian said:

> But as we've seen it seems there are (basic?) configurations which were fine before, therefore I'm not entirely sure we can really push this as an SRU :-/

That does sound to me like a cherry-pick is required here, unless a case is made for the exceptions I outlined above.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for confirming my bad feeling here Robie :-/

I might not know enough of real nchan/nginx usage, if we can make it clear that the old version in focal really is useless for any real case then we can SRU the new version.

Otherwise we'd really need to find the individual fix to pick it onto what we have in Focal.

@Damjan - can you help to differ between the above two cases?

Revision history for this message
Damjan Georgievski (gdamjan) wrote :

What is SRU?

Revision history for this message
Robie Basak (racb) wrote :

Sorry Damjan, I should have explained that. SRU is a Stable Release Update - the policy and process we use in Ubuntu to update packages in stable releases. https://wiki.ubuntu.com/StableReleaseUpdates has full details.

tags: removed: server-todo
Changed in nginx (Ubuntu Focal):
assignee: Christian Ehrhardt  (paelzer) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.