Patch to be tolerant of whitespace in URLs

John Tytgat joty at netsurf-browser.org
Sat May 24 13:02:27 BST 2008


In message <1211628803.4056.7.camel at felix>
          Philip Boulain <prb at ecs.soton.ac.uk> wrote:

> The attached patch makes url_normalize take care of whitespace in a
> fairly useful way, consistent with other browsers:
> [...]

I can't judge for the overall correctness but a couple of suggestions:

> +	/* encode any remaining (internal) whitespace */
> +	for (i = 0; i < len; i++) {
> +		if(isspace((*result)[i])) {
> +			/* snprintf is all too keen to write damn nulls */
> +			char esc[4];
> +			char space = (*result)[i];
> +			memmove((*result) + i + 2, (*result) + i, 1 + len - i);
> +			len += 2;
> +			snprintf(esc, 4, "%%%02hhx", space);
> +			strncpy((*result) + i, esc, 3);
> +		}
> +	}

You go through a great pain to use snprintf().  Why not something like (not
tested):

+	/* encode any remaining (internal) whitespace */
+	for (i = 0; i < len; i++) {
+		if(isspace((*result)[i])) {
+			char space = (*result)[i];
+			memmove((*result) + i + 2, (*result) + i, 1 + len - i);
+ 			(*result)[  i] = '%'
+ 			(*result)[++i] = digit2lowcase_hex(space >> 4);
+ 			(*result)[++i] = digit2lowcase_hex(space & 0xF);
+			len += 2;
+		}
+	}

And a suitable digit2lowcase_hex():

static char digit2lowcase_hex(char digit)
{
  return (digit <= '9') ? (digit - 0)"0123456789" : (digit - 10)"abcdef";
}

Algorithmically it would be better to start from the end of *result buffer
as that would avoid calling memmove() but that perhaps stretching things a
bit. ;-)

It might also enhance the code quality if we would use a temporary variable
holding *result and use that.

> -	for (i = 0; (unsigned)i != len; i++) {
> +	for (i = 0; (size_t)i + 2 < len; i++) {
>  		if ((*result)[i] != '%')
>  			continue;
>  		c = tolower((*result)[i + 1]);

Wouldn't it be better to use size_t for i from the start ?

John.
-- 
John Tytgat
joty at netsurf-browser.org



More information about the netsurf-dev mailing list