bzr should do smarter merging of .po files

Bug #884270 reported by Steve Langasek
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
High
Vincent Ladeuil

Bug Description

Today, if there are local Ubuntu translations or new UI strings added in an Ubuntu package, and updates happen to the translations in Debian, bzr merge-package does a very bad job of merging them. Fundamentally the reason for this is that .pot and .po files are stanza-based files, and bzr is using naive line-based merges of the files. This gives inferior results compared to an intelligent msgmerge for some things (such as fuzzy translations), and pathological results for others (such as modification date fields).

It would be splendid if bzr could intelligently merge .po files using msgmerge.

Related branches

Revision history for this message
Steve Langasek (vorlon) wrote :

Attached is the horrible script I currently use locally to resolve .po and .pot file conflicts when doing package merges. It should be noted that 'make' is a very poor heuristic for handling the .pot file regeneration; in particular, any debian/po/templates.pot files should be regenerated instead using debconf-updatepo with appropriate options. And making bzr merge call 'make' is in general fairly horrid. :)

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 884270] Re: bzr should do smarter merging of .po files

  status confirmed
  importance high

Changed in udd:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Vincent Ladeuil (vila) wrote :

@Steve: Do you have a (branch, revno) where you use this script to resolve conflicts before committing ?

Changed in udd:
assignee: nobody → Vincent Ladeuil (vila)
status: Confirmed → In Progress
Revision history for this message
Steve Langasek (vorlon) wrote :

Branches where I've done this recently include lp:ubuntu/debhelper, lp:ubuntu/console-setup, and lp:ubuntu/adduser.

Revision history for this message
Vincent Ladeuil (vila) wrote :

First shot analysis: the main part of the script seem to be to merge .po files with msgmerge.

The first idea is to have a merge hook for .po files with the following constraints/fallouts:

- the .pot file must be usable (trying to require the .pot to be merged without conflicts *before* any .po file sounds too complex), if it contains conflicts, the hook will not apply

- once the .pot file is in good shape, the user can still use 'bzr remerge <right path>/.po' to reprocess the .po files and the hook will then apply

- some configuration is needed to apply the merge hook to the right .po files and acquire the path for the .pot file

If conflicts occur in .pot file in most scenarios, we may want to investigate a merge hook for it but is there some msgmerge invocation (msgcat ?) for that or is a specific solution required for that ?

Steve ?

Revision history for this message
Steve Langasek (vorlon) wrote :

'bzr remerge' as a strategy sounds good to me!

On Fri, Nov 18, 2011 at 12:11:49PM -0000, Vincent Ladeuil wrote:
> If conflicts occur in .pot file in most scenarios, we may want to
> investigate a merge hook for it but is there some msgmerge invocation
> (msgcat ?) for that or is a specific solution required for that ?

> Steve ?

Most cases that will cause conflicts in .po files also cause conflicts in
.pot files; there's no fix for the .pot file itself with msgmerge (msgmerge
explicitly acts only on .po files using .pot files as input), and the
generation of .pot files is context-specific, requiring running $something.
(Options are: './configure && make $domain.pot', 'debconf-updatepo
--skip-merge', 'po4a $args', ...)

So I don't think there's a good merge hook for .pot files themselves.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Martin Pool (mbp) wrote :

So if the pot file is generated but checked in, then we probably can't do anything but regenerate it in some tree-specific way.

Looking at the structure of pot files, they seem like they could potentially be automatically resolved, if that was what you wanted: a merge of the file rather than regenerating it.

We could possibly have a post-resolve hook that runs this on the po file after pot file conflicts are resolved.

We could also automatically mark the file resolved when the conflict markers are removed (by re-making it) which would be well worthwhile for other reasons and probably already has a bug number. That would cut all the actual invocations of bzr out of your script.

Revision history for this message
Vincent Ladeuil (vila) wrote :

Ok, I'll prototype a merge hook calling msgmerge as outlined above.

Note that remerge currently have some limitations that may reduce its usefulness, I'm not clear about the fallouts but at worst it would mean that it can be used only when the remaining conflicts are the .po related ones.

If we encounter issues around that I'll fix them but I wanted to make sure you know about the possible pitfalls.

Revision history for this message
Steve Langasek (vorlon) wrote :

On Mon, Nov 21, 2011 at 09:04:54AM -0000, Martin Pool wrote:
> So if the pot file is generated but checked in, then we probably can't
> do anything but regenerate it in some tree-specific way.

> Looking at the structure of pot files, they seem like they could
> potentially be automatically resolved, if that was what you wanted: a
> merge of the file rather than regenerating it.

I think regenerating .pot files is more reliable than trying to merge them,
since there are varying scenarios which need to be resolved differently in
order to make sure the .pot file matches the strings in the source. (E.g.,
new strings added on both left and right, vs. the same string modified on
right but dropped on left)

So it's an attractive proposition, but I don't think it would work in
practice.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Martin Pool (mbp) wrote :

So the basic plan, as I understand it, is:

 - you do a bzr merge
 - it conflicts on the pot and po files
 - you manually regenerate the pot file, eg by running make
 - you run `bzr remerge po/*.po`
 - bzr fires a hook which calls out to msgmerge

vila, is that what you intend? slangasek, would it be a reasonable solution?

Revision history for this message
Vincent Ladeuil (vila) wrote :

> vila, is that what you intend?

precisely

Revision history for this message
Steve Langasek (vorlon) wrote :

On Tue, Nov 22, 2011 at 07:15:17AM -0000, Martin Pool wrote:
> So the basic plan, as I understand it, is:

> - you do a bzr merge
> - it conflicts on the pot and po files
> - you manually regenerate the pot file, eg by running make
> - you run `bzr remerge po/*.po`
> - bzr fires a hook which calls out to msgmerge

> vila, is that what you intend? slangasek, would it be a reasonable
> solution?

Sounds perfectly reasonable to me.

Thanks,
--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Vincent Ladeuil (vila) wrote :

Ok, I've prototyped a merge hook calling msgmerge as outlined above (available in the branch associated with this bug).

See 'bzr help po_merge' for details.

Note that there is some glitches I discovered while writing/testing this plugin:

1) *during* the merge, the resulting merged .pot file is not easily available (either the .po files are merged before it or the resulting file is in limbo, waiting for the merge to complete before being put into the working tree).

This is both good and bad:

- good: the .pot file present in the tree (with the content prior to the merge) is available and therefore can be used

- bad: since this is the basis .pot file it may not contain the expected strings

2) remerge currently refuses to act on files that are not conflicted

Since the hook trigerred with the basis .pot file, there is no easy way to trigger it again with the merged .pot file (once conflicts are resolved).

That being said, I've added a test that outlines a possible workflow:

- merge with the hook disabled => conflicts in .pot and .po files
- resolve conflicts in .pot files
- remerge with the hook enabled => the .po files are then merged with the correct .pot file

Before going further, I'd appreciate some feedback, especially on:
- the workflow above, is it acceptable ? Does it capture enough use cases ? Can you see simpler alternatives ?
- config: there is one option for the .pot file and one for the .po files, it's a bit error prone. If there is a single .pot file in a given po directory, then a single option specifying this po dir can be used and '*.po' and '*.pot' can be used by the plugins to achieve the same effect. This assumes that there is always a single .pot file in a given directory. Is this assumption valid ?

Revision history for this message
Vincent Ladeuil (vila) wrote :

Precision: the attached branch is a bzr one with the plugin embedded as a core one.

Revision history for this message
Martin Pool (mbp) wrote :

slangasek said on irc he doesn't currently have any branches with this problem.

I suggest you advertise on ubuntu-devel-discuss.

Revision history for this message
Vincent Ladeuil (vila) wrote :

Based on an IRC discussion with David Planella, I'll go with the simplified config:
- po_merge.pot_dirs = po (default)
- po_merge.command = <as above> (default)
- po_merge.po_glob = *.po (default)
- po_merge.pot_glob = *.pot (default)

So that the default values should cover most of the packages and ``po_merge.pot_dirs`` is the only option people should have to override for the special cases.

Vincent Ladeuil (vila)
affects: udd → bzr
Revision history for this message
Steve Langasek (vorlon) wrote :

Hi Vincent,

On Mon, Nov 28, 2011 at 02:54:17PM -0000, Vincent Ladeuil wrote:
> Based on an IRC discussion with David Planella, I'll go with the
> simplified config:
> - po_merge.pot_dirs = po (default)

Was 'po_merge.pot_dirs = po debian/po' discussed and explicitly ruled out?
debian/po is not a special case for me, and I would probably wind up setting
this globally. Maybe this option could be a default from bzr-builddeb?

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Vincent Ladeuil (vila) wrote :

@Steve: 'po_merge.pot_dirs = po,debian/po' has been mentioned in the review and will be the default in the plugin itself.

Vincent Ladeuil (vila)
Changed in bzr:
milestone: none → 2.5b4
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.