Edge cases

Mar. 2nd, 2004 01:02 pm
dtm: (Default)
[personal profile] dtm
Edge cases are hard. Off-by-one errors are painful. (see also page 25 of this pdf)

Why, then, do programmers seem to delight in creating edge cases for themselves?

I remember as an undergrad. helping out in the CS lab at Carleton. One of the first very painful assignments for people in the second CS course in our CS sequence involved making and using singly linked lists. Thorny edge cases all over the place, and with pointers too.

Now, there are some edge cases here, but not nearly as many as people think, if they simply use slightly different variables.


Many people, when confronted with a singly-linked list defined by:
typedef struct _listNode {
  int data;
  struct _listNode *next;
} listNode;

typedef list *listNode;

will write a "pop" function (find the last thing in the list, return the data, and free the listnode) something like this:
int pop (list *myList) {
  int retval;
  listNode *myNode;
  listNode *prevNode;
  if (*myList == NULL) {
    fprintf(stderr, "Can't pop() an empty list!\n");
    return -1;  # Professor said complain and return -1 on empty list
  }
  if ((*myList)->next == NULL) {
    retval = (*myList)->data;
    free(*myList);
    *myList = NULL;
    return retval;
  }
  prevNode = (*myList)->next; ###
  myNode = prevNode->next;
  while (myNode->next != NULL) {
    prevNode = myNode;
    myNode = myNode->next;
  }
  retval = myNode->data;
  prevNode->next = NULL;
  free(myNode);
}

Note that this code breaks nastily on a two element list. The common fix to this is to add another special case. (The slightly less common fix is to realize that the line marked with ### should be changed)

My preferred approach is this:
int pop (list *myList) {
  int retval;
  listNode **thingToChange;
  listNode *myNode;

  if (*myList == NULL) {
    fprintf(stderr, "Can't pop() an empty list!\n");
    return -1;  # Professor said complain and return -1 on empty list
  }
  thingToChange = myList;
  myNode = *myList;

  while (myNode->next != NULL)
  {
    thingToChange = & (myNode->next);
    myNode = myNode->next;
  }
  
  retval = myNode->data;
  *thingToChange = NULL;
  free(myNode);
  return retval;
}

Note that there is now only one special case - a bit more interesting is push(), which can be written with no special cases, and doubly-linked lists, which under the naive implementation are chock full of special cases. I'll admit that in this simple example the extra special case seems worth the effort.


Maybe special cases are the kind of thing that creeps up on you - it's easy to do just one, but soon you end up with a dozen tricky edge cases because adding each one was easier than re-thinking your algorithm. However, I often find that special cases are added in preference to using a few extra variables, or allocating a little bit more memory.

I'm thinking about this because I recently dealt with someone at work being burried under special cases. The specific problem has been simplified; I'll include it in an lj-cut below. When he called me, he'd just gotten through trying to work out how to handle all his different test cases, which were ending up in at least four tricky edge cases, and several flag variables that he was having trouble keeping separate. In fact, I was able to simplify the main loop to have just three if statements (none nested), and two of those if statements were identical.

You will be given as a parameter a file that contains some number of lines (all distinct) in alphabetic order. Each line will match the regular expression ^\w+\.[A-TV-Z]$. Each line can be interpreted as "option name DOT exchange letter".

You are to produce as output a file also sorted in alphabetic order which contains the same lines as the input file plus a line ending in ".U" for each distinct option name in the original file. For example, if given:
7A100A.A
7A100A.P
7A100A.W
7A100A.X
7A100A.Y
7A100B.W
7A100B.X
7A100B.Y
7A100C.A
7A100C.B

you should produce:
7A100A.A
7A100A.P
7A100A.U
7A100A.W
7A100A.X
7A100A.Y
7A100B.U
7A100B.W
7A100B.X
7A100B.Y
7A100C.A
7A100C.B
7A100C.U

Because of the huge size of the files involved, anything that entails reading the whole file into memory really isn't an option.

You can see the edge cases all over, right? Well, they aren't actually necessary, if you use a few extra variables.


Here's the solution:



(extra space in case you don't want to see the solution)



#!perl

use strict;
use vars qw($PrevU $PrevUPrint);

$PrevU = $PrevUPrint = '';

while (<>)
{
  if ($PrevU lt $_) {print $PrevUPrint; $PrevUPrint = '';}
  my $thisU = $_;
  $thisU =~ s/\.[A-Z]$/.U/ or die "Format error";
  if ($thisU ne $PrevU)
  {
    $PrevU = $PrevUPrint = $thisU;
  }
  if ($PrevU lt $_) {print $PrevUPrint; $PrevUPrint = '';}
  print;
}
print $PrevUPrint;

The special cases ("have I printed out the .U yet? Do I have to do stuff at the end of the loop?") are all taken care of by having the variable $PrevUPrint just set to the appropriate NullObject. (In the case of print, that's the empty string)

Date: 2004-03-03 09:23 am (UTC)
From: [identity profile] eyelessgame.livejournal.com
cool solution -- one that many people would miss because they'd reflexively chomp after reading each line.

By the way, in your C snippet, it's

typedef listNode *list;

not the other way around.

The linked list example is a fun problem. The lesson is that many special cases can be resolved with an extra layer of indirection -- one could argue that a lot of object-oriented programming relies on exactly this statement.

Yeesh...

Date: 2004-04-07 09:47 pm (UTC)
From: (Anonymous)
Of course, any algorithm which requires scanning to the end of a singly linked list (on a routine basis) is probably using the wrong data structure. I suppose you could be stuck with someone else's bad data structure choice, however.

September 2024

S M T W T F S
1234567
891011121314
15161718192021
22232425 262728
2930     

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 29th, 2025 03:22 am
Powered by Dreamwidth Studios