[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: commit sizes, coding conventions, and back-ldif



Hallvard B Furuseth wrote:
My patch to back-ldif/ldif.c for ITS#5408 is larger than the ldif.c file
itself, which has left me wondering what to do with monster pathces:

Should they be split up in stepwise commits so it's easier to see the
logic of each change?  When reading patches, I vastly prefer them split
up that way, but I don't know how much that is worth in the CVS tree.
E.g. first a commit which preserves the program logic including bugs and
just moves code around, then one with a bugfix which now can be small
and easy to read, then one for the next bug, and so on.

Besides being more work, one problem with that approach is that one
can make an error in an poorly tested intermediate version - since that
version was never intended to be used anyway.  Also I don't know what
impact it has on CVS merging.

Stepwise - that makes sense to me when you are actually committing while the work is in-progress. Since you've chosen to not to commit any intermediate patches as you developed them, I don't think it really makes sense to artificially recreate those intermediate steps after the fact. If you want the sequence of logic to be visible, just describe it in the comments.


If there were well defined functional boundaries for the patches, then that would still be worth preserving in separate commits, particularly if it were possible to roll back one patch and still keep everything else. It's always preferable to have smaller patches. And it's always preferable to have small, well-defined bug reports in the ITS, so we can have one patch = one ITS. In this case, ITS#5408 is quite a bundle.

Also, what's the thereshold for when it makes sense to reformat and
rearrange code just for the sake of readability?  back-ldif ihas no
consistent coding style.  Normally it's "don't reformat, it makes CVS
merges harder".  In this case it was easy with some of it - I touch more
than half of the lines anyway, so I could just as well reformat those
and a few more to something closer to the OpenLDAP conventions (to the
degree I understand them).  But maybe in this case I should just as well
reformat it all, and maybe also it's a good time time to split back-ldif
up in 9 files of ~100-300 lines instead of one of ~1500 lines.
(Regarding CVS merges, the HEAD and RE24 versions are equal, but RE23
differs.)

If the new code is going to be so different anyway, it may be best to only do two commits - first with a pure reformat of the existing code (and no syntactic changes), and then with the bugfix.


At the moment I see no compelling reason to split back-ldif into multiple files.

Another extreme example was a recent change to a big chunk of code,
where the only change was to wrap the code in a new if-test and indent
it.  It would have been nice with a a comment in the commit, or a 1-line
commit + a cosmetic commit.

Regarding formatting, when I do it I'd prefer to do it right - but I've
never quite figured out what the preferred conventions are.  We've had a
style threads before but I don't remember if any of them got very far.
So I thought I'd list some rules and confusions of my own that I've
noticed.

Just offering my personal views, nothing official here.

* Generous whitespace:
   - Around parens in control statements - "if ( x )" etc.

Yeah, I noticed that some of the older code treated "if" as a function call, so the parens were adjacent to the "if": "if( x )" which I'm not fond of...


   - Inside non-empty parens in function calls, sometimes sizeof,
     sometimes inside [].

I generally don't like spaces inside []. There are some sections of the DN parsing code (using more than 1 dimensional arrays) where it helps readability though.


   - Usually around binary operators.

However often but not always whitespace between two parens are omitted -
e.g. "foo( bar( baz ))".  Is there some preferred guideline about that?

How about whitespace around parens that are for grouping?
E.g. "if ( (ch = getc(f)) != EOF )".

I prefer no whitespace between consecutive parens. Naked parens always look like a syntax error to me, they're an immediate distraction that I then have to dismiss before focusing on the point of interest.


* Similarly generous use of {} for if statements etc, usually used even
   when enclosing just a single statement.

I prefer no brackets for single statements, but I recognize that this leads to inconsistencies. (Particularly if the single statement is itself an if, and there is an else clause involved. In that case you're forced to use brackets anyway to disambiguate...)


* Indent 1 tab for continuation lines with the same paren level, e.g.
	x = foo(
		bar, baz );
	y = one
		? two
		: three;

BTW, emacs will format this the OpenLDAP way:
	wups(
		foo, bar );
but if there is anything after the '(' it aligns bar with it:
	wups( foo,
	      bar );
OpenLDAP code often does put arguments after the '(' but to my eyes it
looks easier to read without that, in addition to making emacs happy.

I prefer as many arguments as will fit on the first line. Excessive line breaks make it harder to grep for meaningful patterns.


Complex multi-line statements require more creativity, but I can't
tell what, if any, ideas lie behind their indentation (beyond trying
to avoid such statements).  Don't know if e.g. this is OpenLDAPpy:
	if ( (one( two,
			three )&&
		(four ||
		 (five&&
		six))) )


* Tab-width = 4 columns.  Makes a difference for when to break lines
   (keeping line length<  80), how to align comments after code on a
   line, and to align declarations and multi-line statements.

Yes, that's pretty much carved in stone.

   It doesn't hurt to try to keep code nice-looking with tab width 8 too.
   (E.g. comments above statements instead of aligned to the right, and
   often do not align variables in declarations.)

Not a consideration. 8-column tabs would spill a lot of lines over 80 columns but there's no reason to worry about that.


* Indent with tabs, not spaces.

Yes.

* Usually align comments/decls to the right of code with tabs too, but
   not always.  (If we used space to align after code, it would stay
   aligned regardless of tab-width.)

I prefer tabs.

* Function names at first column in function definitions.

Yes. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/