openjade segfaults on all arch

Bug #1869734 reported by Christian Ehrhardt 
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openjade (Debian)
Fix Released
Unknown
openjade (Ubuntu)
Fix Released
Undecided
Unassigned
pgpool2 (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

<Myon> cpaelzer: fun, openjade segfaulting on focal/arm64
<cpaelzer> I beg your pardon for my ignorance of the ecosystem, but how is this related to postgresql-apt
<cpaelzer> for doc generation?
<Myon> cpaelzer: the postgresql-9.5 and -9.6 builds fail during doc generation on focal/arm64
<Myon> everything else (other archs, other dists) is fine
<Myon> https://pgdgbuild.dus.dg-i.net/job/postgresql-9.5-binaries/architecture=arm64,distribution=focal/55/console

<cpaelzer> Myon: only on arm64?
<Myon> only there, yes

<Myon> ./configure && make -C doc/src/sgml all should reproduce it
<cpaelzer> clone and build this branch ? https://github.com/postgres/postgres/tree/REL9_5_STABLE
<Myon> the crash is in /usr/lib/libostyle.so.1 from libostyle1c2
<cpaelzer> here it is
<cpaelzer> segfault

<Myon> I have this now http://paste.debian.net/1136839/
<cpaelzer> https://paste.ubuntu.com/p/CfzrfWkqzs/

<cpaelzer> well I wanted to get -O0 to see more int he debugger
<cpaelzer> my build worked as well now
<Myon> the -O0 openjade installed doesn't segfault
<cpaelzer> oh so like my -O0 then
<cpaelzer> so maybe we should just always -O0 openjade then?
<cpaelzer> TBH who cares about optimization in that
<cpaelzer> But its usage is mostly build time and not runtime
<Myon> maybe -O0 on arm64 only
<cpaelzer> I'm rebulding -O2 again to recheck if that really is it
<cpaelzer> and then -O1 to check the in between
<cpaelzer> it might "just" need a rebuild

<cpaelzer> -O2 re-build segfault
<cpaelzer> -O1 (for completeness) rebuild ... seems to hang?
<Myon> probably not really relevant
<cpaelzer> agreed, but I want to rebuild -O0 again just to see it builds and then works
<cpaelzer> -O0 on arm64 really seems to be a good choice

<Myon> I can try if rebuilding makes it fail on sid
<Myon> cpaelzer: recompiling openjade in arm64/sid doesn't make it segfault
<Myon> so it seems this needs a focal-only fix
<Myon> cpaelzer: the openjade segfault is also present in pgpool2, so not just in ancient PostgreSQL :(

Changed in openjade (Ubuntu):
status: New → Triaged
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Get a postgresql9.5 package and build it, it will fail.
Re-run the failing command:

$ gdb /usr/bin/openjade
(gdb): run -wall -wno-unused-param -wno-empty -wfully-tagged -wnet -D . -D . -c /usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml

Program received signal SIGSEGV, Segmentation fault.
0x0000fffff7ee3040 in OpenSP::Vector<OpenJade_DSSSL::ProcessingMode::Rule*>::erase (p1=0x1, p2=0x2, this=<optimized out>, this=<optimized out>) at /usr/include/OpenSP/Vector.cxx:131
131 /usr/include/OpenSP/Vector.cxx: No such file or directory.
(gdb) bt
#0 0x0000fffff7ee3040 in OpenSP::Vector<OpenJade_DSSSL::ProcessingMode::Rule*>::erase (p1=0x1, p2=0x2, this=<optimized out>, this=<optimized out>) at /usr/include/OpenSP/Vector.cxx:131
#1 0x0000fffff7ee3038 in OpenJade_DSSSL::ProcessingMode::RootRule::compareSpecificity (this=<optimized out>, rule=...) at ProcessingMode.cxx:331
#2 0x0000fffff7ee4dc0 in OpenJade_DSSSL::ProcessingMode::addRootRule (this=<optimized out>, expr=..., ruleType=OpenJade_DSSSL::ProcessingMode::constructionRule, loc=..., interp=...)
    at /usr/include/OpenSP/Vector.h:49
#3 0x0000fffff7eec698 in OpenJade_DSSSL::SchemeParser::doRoot (this=this@entry=0xffffffffe800) at SchemeParser.cxx:484
#4 0x0000fffff7ef4254 in OpenJade_DSSSL::SchemeParser::parse (this=this@entry=0xffffffffe800) at SchemeParser.cxx:190
#5 0x0000fffff7ef9c0c in OpenJade_DSSSL::StyleEngine::parseSpec (this=this@entry=0xaaaaaab7f670, specParser=..., charset=..., id=..., mgr=..., defVars=...) at StyleEngine.cxx:166
#6 0x0000fffff7e78ad0 in OpenJade_DSSSL::DssslApp::processSysid (this=0xffffffffeae8, sysid=...) at /usr/include/OpenSP/CodingSystemKit.h:52
#7 0x0000fffff7b499c8 in OpenSP::EntityApp::processArguments(int, char**) () from /lib/libosp.so.5
#8 0x0000fffff7b39dd4 in OpenSP::CmdLineApp::run(int, char**) () from /lib/libosp.so.5
#9 0x0000aaaaaaacbc54 in ?? ()
#10 0x0000fffff76f0090 in __libc_start_main (main=0xaaaaaaacbc08, argc=21, argv=0xfffffffff258, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
    stack_end=<optimized out>) at ../csu/libc-start.c:308
#11 0x0000aaaaaaacd060 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Harr,
so we knew we could rebuild it on the system
-O2 => segfault
-O0 => working

Building on LP (PPA) doesn't behave the same way.
Even thou https://launchpadlibrarian.net/471751679/buildlog_ubuntu-focal-arm64.openjade_1.4devel1-21.3ubuntu1~ppa1_BUILDING.txt.gz really built it -O0 the binary out of that package keeps crashing.

So we might need to debug it in -O2 + other local modification find the real issue and fix it.
The rebuild approach doesn't help for real :-/

tags: added: patch
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, now things make sense.
src:openjade also builds libostyle1c2 and updating that to the -O0 build made it work.
So overall this would be a good fix still.

Drawback, I'd love to know what is going on under the covers ... :-/

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

If I run this with the -O0 build it never ever reaches
  OpenSP::Vector<OpenJade_DSSSL::ProcessingMode::Rule*>::erase
So the code is still broken, but we just don't reach it.
-O0 isn't even a good mitigation.

Tracking the root cause further.

Very late in the -O0 build it then hits it like:
::erase (this=0xaaaaab027250, p1=0xaaaaaaeb5cf0, p2=0xaaaaaaeb5cf8)
::erase (this=0xaaaaab46cf10, p1=0xaaaaab883120, p2=0xaaaaab883128)
...
All that looks reasonable.
The other call we see in -O2 builds is broken:
::erase (p1=0x1, p2=0x2, ...

But this is actually a header in: src:opensp

Defines in
/usr/include/OpenSP/Vector.cxx:331

for (const T *p = p1; p != p2; p++)
  # works on pointer p

But for any type that does not surely increase the right way.
E.g. p1 = 0x1, p2 = 0x2 but increment = 4 (due to pointer arithmetic)
Then it will increment p forever.

If we use <= it will stop as soon as we would run over.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

BTW src:opensp a Cxx file in includes really?
This is on version 1.5.2 for ages no upstream changes.

It is possible that depeding on the optimization of openjade on build it changes the code flow into ::erase and then triggers the bad behavior.

replaced the broken line with:
  for (const T *p = p1; p <= p2; p++)

Rebuilt openjade against that fixed version but that thought was too easy. It will still try to access p1=0x1 and fault. I'll need to understand where exactly those p1/p2 args come from and fix it there.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (7.4 KiB)

Three classes Rule, ElementRule and QueryRule define
    int compareSpecificity(const Rule &) const;
    int compareSpecificity2(const ElementRule *) const;
    int compareSpecificity2(const QueryRule *) const;

Per Backtrace we seem to be in a "Rule" object but since the passed object is optimized out I don't see the rule type. But all three implementations are defined as:

int ProcessingMode::Rule::compareSpecificity2(const RootRule *rule) const
{
  return - compareParts(rule);
}

And that is:
int ProcessingMode::Rule::compareParts(const Rule *r) const
{
  unsigned i1 = action().partIndex();
  unsigned i2 = r->action().partIndex();
  if (i1 == i2)
    return 0;
  return i1 < i2 ? -1 : 1;
}

I don't see the path to erase() anywhere in here. The optimization has torn this so much apart that I can't completely unwind the stack as needed.

With some gdb magic I was able to verify that on the way to death I pass:
Breakpoint 4, OpenJade_DSSSL::ProcessingMode::RootRule::compareSpecificity2 (this=0xaaaaaaeb58b0, rule=0xaaaaaae96ba0) at ProcessingMode.cxx:344

That is:
int ProcessingMode::RootRule::compareSpecificity2(const RootRule *rule) const
{
  int result = Rule::compareSpecificity2(rule);
  if (result)
    return result;
  return 0;
}

And the two objects are not yet optimized out:
(gdb) p *this
$5 = {<OpenJade_DSSSL::ProcessingMode::Rule> = {<OpenJade_DSSSL::MatchBase> = {_vptr.MatchBase = 0xfffff7fa97b0 <vtable for OpenJade_DSSSL::ProcessingMode::RootRule+88>, trivial_ = true},
    _vptr.Rule = 0xfffff7fa9770 <vtable for OpenJade_DSSSL::ProcessingMode::RootRule+24>, action_ = {ptr_ = 0xaaaaaaeb5b80}}, <No data fields>}
(gdb) p *rule
$6 = {<OpenJade_DSSSL::ProcessingMode::Rule> = {<OpenJade_DSSSL::MatchBase> = {_vptr.MatchBase = 0xfffff7fa97b0 <vtable for OpenJade_DSSSL::ProcessingMode::RootRule+88>, trivial_ = true},
    _vptr.Rule = 0xfffff7fa9770 <vtable for OpenJade_DSSSL::ProcessingMode::RootRule+24>, action_ = {ptr_ = 0xaaaaaae96b50}}, <No data fields>}

That seems to be the same rule but with different actions:
(gdb) p *(this->action_.ptr_)
$10 = {<OpenSP::Resource> = {count_ = 2}, defLoc_ = <incomplete type>, expr_ = {_vptr.Owner = 0xfffff7f98fb8 <vtable for OpenSP::Owner<OpenJade_DSSSL::Expression>+16>,
    p_ = 0xaaaaaaeb5b10}, insn_ = {<OpenSP::Ptr<OpenJade_DSSSL::Insn>> = {ptr_ = 0x0}, <No data fields>}, sosofo_ = 0x0, partIndex_ = 2}

(gdb) p *(rule->action_.ptr_)
$9 = {<OpenSP::Resource> = {count_ = 1}, defLoc_ = <incomplete type>, expr_ = {_vptr.Owner = 0xfffff7f98fb8 <vtable for OpenSP::Owner<OpenJade_DSSSL::Expression>+16>, p_ = 0xaaaaaae96b00},
  insn_ = {<OpenSP::Ptr<OpenJade_DSSSL::Insn>> = {ptr_ = 0x0}, <No data fields>}, sosofo_ = 0x0, partIndex_ = 1}

That is at:
#0 OpenJade_DSSSL::ProcessingMode::RootRule::compareSpecificity2 (this=0xaaaaaaeb58b0, rule=0xaaaaaae96ba0) at ProcessingMode.cxx:344
#1 0x0000fffff7ee3038 in OpenJade_DSSSL::ProcessingMode::RootRule::compareSpecificity (this=<optimized out>, rule=...) at ProcessingMode.cxx:331
#2 0x0000fffff7ee4dc0 in OpenJade_DSSSL::ProcessingMode::addRootRule (this=<optimized out>, expr=..., ruleType=OpenJade_DSSSL::ProcessingMode::constructionRule, loc=..., interp=...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

... make debugging harder

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So we have a debuggable build with -O0 that doesn't trigger the error.
And one that "suddenly appears at ::erase with bad arguments".
:-/

We might have to come back to just set -O0 on arm64 ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I installed gcc/g++ in version 7 and 8 as well
and added in d/rules:
  CC=gcc-7
  export CC
  CPP=cpp-7
  export CPP
  CXX=g++-7
  export CXX

I see the build being:
libtool: compile: g++-7 -DHAVE_CONFIG_H -I. -I.. -I../grove -g --pipe -fpermissive -fno-lifetime-dse -O2 -MT GroveBuilder.lo -MD -MP -MF .deps/GroveBuilder.Tpo -c GroveBuilder.cxx -o GroveBuilder.o >/dev/null 2>&1

But the resulöting binary crashes in the same way and does not expose more insights.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

With GCC-7 the -O1 build segfaults as well.
Maybe more debug info in there?

=> nothing more helpful :-/

Lets see if we can get the same result with:
-fauto-inc-dec
-fbranch-count-reg
-fcombine-stack-adjustments
-fcompare-elim
-fcprop-registers
-fdce
-fdefer-pop
-fdelayed-branch
-fdse
-fforward-propagate
-fguess-branch-probability
-fif-conversion
-fif-conversion2
-finline-functions-called-once
-fipa-profile
-fipa-pure-const
-fipa-reference
-fipa-reference-addressable
-fmerge-constants
-fmove-loop-invariants
-fomit-frame-pointer
-freorder-blocks
-fshrink-wrap
-fshrink-wrap-separate
-fsplit-wide-types
-fssa-backprop
-fssa-phiopt
-ftree-bit-ccp
-ftree-ccp
-ftree-ch
-ftree-coalesce-vars
-ftree-copy-prop
-ftree-dce
-ftree-dominator-opts
-ftree-dse
-ftree-forwprop
-ftree-fre
-ftree-phiprop
-ftree-pta
-ftree-scev-cprop
-ftree-sink
-ftree-slsr
-ftree-sra
-ftree-ter
-funit-at-a-time

That is per [1] the list enabled in O1.
I got the above working in ggc-9 (in the sense that it builds and still crashes).

Bisecting build options from here ...

[1]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This is the most trimmed down versions that fails:
-O0 -finline-functions-called-once

But:
-O2 -fno-inline-functions-called-once
does not make it work, probably other optimizations can trigger the same issue.

But maybe the one above is better to debug?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.7 KiB)

Backtrace is more readable now, erase comes from resize it seems.

#0 0x0000fffff7e4f620 in OpenSP::NCVector<OpenSP::Owner<OpenJade_DSSSL::Expression> >::erase (this=0xffffffffe0c8, p1=0xaaaaab46eaf0, p2=0xaaaaab46eb00)
    at /usr/include/OpenSP/Vector.cxx:132
#1 0x0000fffff7e4e0e4 in OpenSP::NCVector<OpenSP::Owner<OpenJade_DSSSL::Expression> >::resize (this=0xffffffffe0c8, n=1) at /usr/include/OpenSP/Vector.h:29
#2 0x0000fffff7ec9ee0 in OpenJade_DSSSL::SchemeParser::parseExpression (this=0xffffffffe538, allowed=16, expr=..., key=@0xffffffffe1cc: OpenJade_DSSSL::Identifier::notKey,
    tok=@0xffffffffe1c8: OpenJade_DSSSL::SchemeParser::tokenCloseParen) at SchemeParser.cxx:1004
#3 0x0000fffff7ec9e8c in OpenJade_DSSSL::SchemeParser::parseExpression (this=0xffffffffe538, allowed=0, expr=..., key=@0xffffffffe1cc: OpenJade_DSSSL::Identifier::notKey,
    tok=@0xffffffffe1c8: OpenJade_DSSSL::SchemeParser::tokenCloseParen) at SchemeParser.cxx:1001
#4 0x0000fffff7eccdd0 in OpenJade_DSSSL::SchemeParser::parseBegin (this=0xffffffffe538, expr=...) at SchemeParser.cxx:1551
#5 0x0000fffff7eccc54 in OpenJade_DSSSL::SchemeParser::parseBody (this=0xffffffffe538, expr=...) at SchemeParser.cxx:1539
#6 0x0000fffff7ec92c8 in OpenJade_DSSSL::SchemeParser::parseDefine (this=0xffffffffe538, ident=@0xffffffffe3d8: 0xaaaaab4827f0, expr=...) at SchemeParser.cxx:861
#7 0x0000fffff7ec8ed0 in OpenJade_DSSSL::SchemeParser::doDefine (this=0xffffffffe538) at SchemeParser.cxx:814
#8 0x0000fffff7ec5f98 in OpenJade_DSSSL::SchemeParser::parse (this=0xffffffffe538) at SchemeParser.cxx:172
#9 0x0000fffff7e9b2d8 in OpenJade_DSSSL::Interpreter::installBuiltins (this=0xaaaaaacf1b50) at Interpreter.cxx:2093
#10 0x0000fffff7e93448 in OpenJade_DSSSL::Interpreter::Interpreter (this=0xaaaaaacf1b50, groveManager=0xffffffffed00, messenger=0xffffffffea50, unitsPerInch=72000, debugMode=false,
    dsssl2=false, style=true, strictMode=false, fotbDescr=...) at Interpreter.cxx:162
#11 0x0000fffff7edb1b8 in OpenJade_DSSSL::StyleEngine::parseSpec (this=0xaaaaaabdae80, specParser=..., charset=..., id=..., mgr=..., defVars=...) at StyleEngine.cxx:85
#12 0x0000fffff7e2f784 in OpenJade_DSSSL::DssslApp::processSysid (this=0xffffffffea48, sysid=...) at DssslApp.cxx:138
#13 0x0000fffff7aaa9c8 in OpenSP::EntityApp::processArguments(int, char**) () from /lib/libosp.so.5
#14 0x0000fffff7a9add4 in OpenSP::CmdLineApp::run(int, char**) () from /lib/libosp.so.5

Also the line number is now correct on the access of pointer p and its destructor call:

line 132 is:
  ((X *)p)->~X();

(gdb) p p
$1 = (const OpenSP::Owner<OpenJade_DSSSL::Expression> *) 0xaaaaab46eb00

Also the passed pointers are intact:
(gdb) p p1
$6 = (const OpenSP::Owner<OpenJade_DSSSL::Expression> *) 0xaaaaab46eaf0
(gdb) p p2
$7 = (const OpenSP::Owner<OpenJade_DSSSL::Expression> *) 0xaaaaab46eb00

The pointers indicate that we are at p2 already.

And that is odd compared to p1.
(gdb) p *p1
$9 = {_vptr.Owner = 0xfffff7f8a040 <vtable for OpenSP::Owner<OpenJade_DSSSL::Expression>+16>, p_ = 0x0}
(gdb) p *p2
$10 = {_vptr.Owner = 0x6c00000063, p_ = 0x1d1}

See how the _vptr.Owner has no reasonable type.

(gdb) ptype T
type = class OpenSP...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The call is from:
OpenSP::NCVector<OpenSP::Owner<OpenJade_DSSSL::Expression> >::resize (this=0xffffffffe0c8, n=1)

That means resize it to size "1".
And it would be ok to delete all later elements.

  void resize(size_t n) {
    if (n < size_)
      erase(ptr_ + n, ptr_ + size_);
    else if (n > size_)
      append(n - size_);
  }

p ptr_
$14 = (OpenSP::Owner<OpenJade_DSSSL::Expression> *) 0xaaaaab46eae0

And therefore ptr_ + size_ => 0xaaaaab46eb00
But there is no valid OpenSP::Owner<OpenJade_DSSSL::Expression at this address.

999 for (;;) {
1000 args.resize(args.size() + 1);
1001 if (!parseExpression(allowCloseParen, args.back(), key, tok))
1002 return 0;
1003 if (!args.back()) {
1004 args.resize(args.size() - 1); <===
1005 break;
1006 }
1007 }

Here:
(gdb) p args
$16 = {_vptr.NCVector = 0xfffff7f8a020 <vtable for OpenSP::NCVector<OpenSP::Owner<OpenJade_DSSSL::Expression> >+16>, size_ = 2, ptr_ = 0xaaaaab46eae0, alloc_ = 2}

But the size is only "2" elements, not three.
(gdb) p *(args.ptr_+0)
$21 = {_vptr.Owner = 0xfffff7f8a040 <vtable for OpenSP::Owner<OpenJade_DSSSL::Expression>+16>, p_ = 0xaaaaab46f9f0}
(gdb) p *(args.ptr_+1)
$22 = {_vptr.Owner = 0xfffff7f8a040 <vtable for OpenSP::Owner<OpenJade_DSSSL::Expression>+16>, p_ = 0x0}
(gdb) p *(args.ptr_+2)
$23 = {_vptr.Owner = 0x6c00000063, p_ = 0x1d1}

b OpenJade_DSSSL::SchemeParser::parseExpression
b OpenSP::NCVector<OpenSP::Owner<OpenJade_DSSSL::Expression> >::resize

(gdb) p tok
$30 = (OpenJade_DSSSL::SchemeParser::Token &) @0xffffffffe1c8: OpenJade_DSSSL::SchemeParser::tokenCloseParen

And I see him taking the route I'd expect then:
(gdb)
977 switch (tok) {
(gdb)
1090 break;
(gdb)
1092 return 1;

That is the default: in the switch a tokenCloseParen doesn't have an entry.

But then it re-enters the switch without obvious reason.

1003 if (!args.back()) {
(gdb) l
998 NCVector<Owner<Expression> > args;
999 for (;;) {
1000 args.resize(args.size() + 1);
1001 if (!parseExpression(allowCloseParen, args.back(), key, tok))
1002 return 0;
1003 if (!args.back()) {
1004 args.resize(args.size() - 1);
1005 break;
1006 }
1007 }

It didn't pass lines 998-1002 at all, not do I see why it would jump up here?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

But we actually have 2 entries.
It should not break.

(gdb) p args
$32 = {_vptr.NCVector = 0xfffff7f8a020 <vtable for OpenSP::NCVector<OpenSP::Owner<OpenJade_DSSSL::Expression> >+16>, size_ = 2, ptr_ = 0xaaaaab46eae0, alloc_ = 2}
(gdb) p *(args.ptr_+0)$33 = {_vptr.Owner = 0xfffff7f8a040 <vtable for OpenSP::Owner<OpenJade_DSSSL::Expression>+16>, p_ = 0xaaaaab46f9f0}
(gdb) p *(args.ptr_+1)
$34 = {_vptr.Owner = 0xfffff7f8a040 <vtable for OpenSP::Owner<OpenJade_DSSSL::Expression>+16>, p_ = 0x0}
(gdb) p *(args.ptr_+2)
$35 = {_vptr.Owner = 0x6c00000063, p_ = 0x1d1}

So resize should be called with argument n=1
  1004 args.resize(args.size() - 1);
(gdb) p args.size()
$36 = 2

But instead:
#0 OpenSP::NCVector<OpenSP::Owner<OpenJade_DSSSL::Expression> >::resize (this=0xffffffffe0c8, n=2) at /usr/include/OpenSP/Vector.h:27

That makes the pointer calculation go too far as +2 is outside the vector.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

One last shot, I've found that this is unexpected by the vector code:
  void resize(size_t n) {
    if (n < size_)
      erase(ptr_ + n, ptr_ + size_);
    else if (n > size_)
      append(n - size_);

But in this case:
(gdb) p n
$39 = 2
(gdb) p size_
$40 = 2

So it would need to do NOTHING!
It already is at the requested size.

I found through gdb that it does that correctly.
The first call is:
(gdb) p n
$39 = 2
(gdb) p size_
$40 = 2
and it does nothing.

Then next time it is
(gdb) p size_
$43 = 2
(gdb) p n
$44 = 1

And it will call erase.
(gdb) p *(ptr_+0)
$49 = {_vptr.Owner = 0xfffff7f8a040 <vtable for OpenSP::Owner<OpenJade_DSSSL::Expression>+16>, p_ = 0xaaaaab46f9f0}
(gdb) p *(ptr_+1)
$50 = {_vptr.Owner = 0xfffff7f8a040 <vtable for OpenSP::Owner<OpenJade_DSSSL::Expression>+16>, p_ = 0x0}
(gdb) p *(ptr_+2)
$51 = {_vptr.Owner = 0x6c00000063, p_ = 0x1d1}

So it thinks it increases (but it doesn't) and then it decreases the vector and fails.
Hrm, still nothing to fix in there.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I've gone rather deep on this, but I think I need to timebox this now.

Remaining questions:
- Maybe the inlining pre-evaluates things wrong and combines resize/size
  in a bad way to pass the value 2 where it should be 1?
- maybe the code<->line association is wrong and this is the first
  args.resize call that uses "+1" but that should just convert to
  a append (instead of erase)
- I can't see it yet where/why the #2 comes from (comment #16) - not the reason
  why it goes in to that path at all (comment #15).
- it seems as if the vector size is racy or inlined with a static value (since it
  depends on optimization flags in gcc)
- I also haven't seen why this should be arm64 only, but it is

We can still use the mitigation shown in comment #2, would someone be willing to review and sponsor that if it is ok?

Revision history for this message
Robie Basak (racb) wrote :

+1 to upload the patch in comment #2 as you need it for release purposes, but as this is a workaround rather than a solution I think this bug should remain open.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks, I have modified the changelog to not - auto-close the bug, but still refer to it.

Uploaded ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This migrated and fixed the issue.
=> https://launchpad.net/ubuntu/+source/openjade/1.4devel1-21.3ubuntu1

But as discussed before since this is more a mitigation than a fix we keep the bug open.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI I got reports this might now also affect ppc64 and amd64.
For ppc64 the confirmation is in that -O0 build fixes it.
So this might have to be re-rolled setting -O0 on all architectures.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
tags: added: update-excuse
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I've found that:
- recompiling openjade on Debian nowadays makes it exhibit the same behavior
- recompiling openjade in Debian&Ubuntu with gcc-9 also fails the same way

I'd love to find the root cause, but my hope of identifying either a compiler-default-option or compiler-version that made it break failed. It stays as it was before, you better not recompile openjade otherwise you'll need -O0 or find the yet unknown root cause.

IMHO for mostly a doc conversion tool (that also seems mostly dead upstream) speed isn't super-important and while not perfect -O0 should be ok'ish for the time being.

For now the mitigation worked, pgpool2 for example now built fine:
  https://launchpad.net/ubuntu/+source/pgpool2/4.1.4-2

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Notes to myself for another day if I need to revisit this

Repro:
# enable sources for apt
$ apt upgrade
$ apt install dpkg-dev openjade docbook-dsssl
$ apt source pgpool2
$ cd pgpool2-4.1.1/doc/src/sgml/
$ openjade -wall -wno-unused-param -wno-empty -wfully-tagged -c /usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d stylesheet.dsl -t sgml -i output-html -V html-index pgpool.sgml

That confirms Ubuntu breaking while Debian works.

Build openjade from the very same source (upstream is dead btw):
$ apt install ubuntu-dev-tools
$ apt build-dep openjade
$ pull-debian-source openjade
$ cd openjade-1.4devel1
$ ./debian/rules build

Then test with this path to the binary-wrapper /root/openjade-1.4devel1/jade/openjade

$ /root/openjade-1.4devel1/jade/openjade -wall -wno-unused-param -wno-empty -wfully-tagged -c /usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d stylesheet.dsl -t sgml -i output-html -V html-index pgpool.sgml

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Since I before had only arm traces here, this is x86:

Program received signal SIGSEGV, Segmentation fault.
__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:384
384 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:384
#1 0x00007ffff7eec459 in OpenSP::Vector<OpenJade_DSSSL::ProcessingMode::Rule*>::erase (p1=<optimized out>, p2=0x555555fc23a0, this=<optimized out>, this=<optimized out>)
    at /usr/include/OpenSP/Vector.cxx:134
#2 0x0000555555c3b378 in ?? ()
#3 0x00007ffff7eede35 in OpenJade_DSSSL::ProcessingMode::addRootRule (this=<optimized out>, expr=..., ruleType=OpenJade_DSSSL::ProcessingMode::constructionRule, loc=..., interp=...)
    at ProcessingMode.cxx:376
#4 0x00007ffff7ef4ab7 in OpenJade_DSSSL::SchemeParser::doRoot (this=0x7fffffffe150) at SchemeParser.cxx:484
#5 0x00007ffff7efba28 in OpenJade_DSSSL::SchemeParser::parse (this=this@entry=0x7fffffffe150) at SchemeParser.cxx:190
#6 0x00007ffff7f00f1f in OpenJade_DSSSL::StyleEngine::parseSpec (this=this@entry=0x5555556201c0, specParser=..., charset=..., id=..., mgr=..., defVars=...) at StyleEngine.cxx:166
#7 0x00007ffff7e8a580 in OpenJade_DSSSL::DssslApp::processSysid (this=0x7fffffffe400, sysid=...) at DssslApp.cxx:138
#8 0x00007ffff7963c7f in OpenSP::EntityApp::processArguments(int, char**) () from /usr/lib/libosp.so.5
#9 0x00007ffff795339b in OpenSP::CmdLineApp::run(int, char**) () from /usr/lib/libosp.so.5
#10 0x000055555557496b in main (argc=16, argv=0x7fffffffeb18) at jade.cxx:206

summary: - openjade segfaults on arm (due to gcc optimization)
+ openjade segfaults on all arch
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

There was a fix suggested on https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=975242 and I tried that. It works great.

I've let the Debian postgresql-maintainer know and there might be an NMU that would then allow to make this package a sync again.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Synced the fix after Christoph uploaded it
=> https://launchpad.net/ubuntu/+source/openjade/1.4devel1-22

Changed in openjade (Debian):
status: Unknown → Fix Released
Revision history for this message
Bryce Harrington (bryce) wrote :

Unless I've misunderstood, from the last few comments it sounds like this is now fixed for hirsute thanks to the sync'd package.
Please reopen if there is any followup work needed.

Changed in openjade (Ubuntu):
status: Triaged → Fix Released
Bryce Harrington (bryce)
Changed in pgpool2 (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.