Skip to content

Instantly share code, notes, and snippets.

@cbourke
Created October 18, 2018 17:17
Show Gist options
  • Select an option

  • Save cbourke/a2eda7ca8f6decd5512c898da18cb8cd to your computer and use it in GitHub Desktop.

Select an option

Save cbourke/a2eda7ca8f6decd5512c898da18cb8cd to your computer and use it in GitHub Desktop.
auto-posted gist

String Processing

  • Strings represent data that may need to be processed
  • Common to deal with CSV (Comma separated value) or similar formatted data
  • Standard library functions can help, but processing involves designing algorithms

More Convenience functions in C

  • There is a ctype.h library that provides several "check" functions for single characters: isalpha(c), isdigit(c), islower(c), isupper(c), isspace(c), conversion: toupper(c), tolower(c), etc.

String Formatting

  • In C: you can use printf to print a formatted string to the standard output
  • You can use sprintf to "print" to a string instead
  • You use it exactly the same as printf but it takes an additional first argument: the string you want to "print" to
  • It is YOUR responsibility to ensure the string is big enough
char s[100];
int x = 10;
double pi = 3.1415
char name[] = "Chris";

//standard output:
printf("Hello, %s, you have a value of %d, and pi is %.3f\n",
name, x, pi);

sprintf(s, "Hello, %s, you have a value of %d, and pi is %.3f\n",
name, x, pi);

char *result = (char *) malloc(sizeof(char) * (strlen(s) + 1));
strcpy(result, s);
//now I have a string, result of the EXACT size I need~!
  • General strategy: you can use a "temporary buffer" like above to format a string and then once formatted, you can use strlen and malloc to create a string of the exact size you need, then copy it over, and return it if needed

Java

  • Java: String.format which uses a printf- style placeholders and returns a String
int x = 10;
double pi = 3.1415
String name = "Chris";

String s = String.format("Hello, %s, you have a value of %d, and pi is %.3f\n",
name, x, pi);

Java: Mutable Strings

  • In Java the String class is immutable
  • In general, immutability is a Very Good Thing: immutable objects are automatically thread-safe
  • However, sometimes you might need a mutable version: StringBuilder: you can change the contents, it shrinks/expands automatically
StringBuilder sb = new StringBuilder();
sb.append("hello");
sb.append(", ");
sb.append("World");
sb.setCharAt(0, 'H');
String s = sb.toString();

Tokenization

  • Some strings may contain formatted data: CSV, TSV (tab separated values)
  • We want to "split" a formatted string up into tokens along some delimiter (tab, space, commas)
  • In C: you can use strtok, it takes two arguments: the string you want to tokenize and a string containing one or more delimiters
  • The first time you call it, you pass the string you want to tokenize
  • Each subsequent call, you pass in NULL in order to continue tokenizing the same string
  • strtok returns a pointer to the next token, NULL when there are no more tokens
  • You can use multiple delimiters, but generally stick with one
char s[] = "Chris,Bourke,UNL,Omaha,NE";
//strtok returns a pointer to the next token, so we need to store that:
char *token = NULL;
//first argument is the string you want to tokenize
//second argument: the delimiter
//it returns a pointer to the "next' token
token = strtok(s, ",");
//the pointer returned, points to a part of the tokenized string
//with the delimiter *replaced* with a null termianting character
//print it:
printf("The first token is %s\n", token);

//next token: passing in NULL tells strtok to continue tokenizing
// where it left off
token = strtok(NULL, ",");`
printf("the second token is %s\n", token);

//how do we continue?
//When no further tokens are available, strtok returns NULL
while(token != NULL) {
  printf("token = %s\n", token);
  token = strtok(NULL, ",");
}

Java

  • All you need to use is the .split method
String s = "Chris,Bourke,UNL,Omaha,NE";
String tokens[] = .split(",");
for(String str : tokens) {
  System.out.println(str);
}
  • The split method actually supports regular expressions
String s = "Chris,Bourke,UNL,Omaha,NE";
String tokens[] = split("\\s");  //splits along any whitespace character
for(String str : tokens) {
  System.out.println(str);
}

String Comparisons in C

  • In both languages, you CANNOT use the == operator to determine if two strings are the same
  • Instead you must use a function or method
  • In both languages, this general comparison function has the following "contract"
  • It is known as a comparator function/method: given two elements, a, b it returns:
    • something negative if a < b
    • zero if a = b
    • something positive if a > b
  • The relative ordering is according to the ASCII text table
  • In C: int strcmp(const char *s1, const char *s2);

Java

  • You can use the .compareTo
String a = "apple";
String b = "banana";
int result = a.compareTo(b); //something negative
result = b.compareTo(a); //something positive
result = b.compareTo(b); //zero
  • Both languages have case-insensitive versions:
  • C: strcasecmp
  • Java: .compareToIgnoreCase()
String a = "apple";
String b = "ApPlE";
int result = a.compareToIgnoreCase(b); //zero










CSCE 155H - Computer Science I Honors

Fall 2018

Strings

  • Strings are ordered sequences of characters (may be ASCII or Unicode)
  • Different languages represent strings differently
  • Most languages provide a standard library of functions/methods to process strings

Strings in C

  • In C, strings are arrays of char elements
  • BUT: in C, all strings must end with a special null-terminating character, \0, NOT the same thing as NULL nor 0 nor void
  • You can declare/use static strings:
char message[] = "hello World!"; //a size 13 character array

//strings in C are mutable:
message[0] = 'H';
  • In the above message is automatically null-terminated and has 12 valid characters, but is of size 13 (to accommodate the null-terminator)
  • You can treat strings in C like arrays because they are!
message[5] = '\0';

printf("message = %s\n", message); //prints "Hello"

message[5] = '_';

printf("message = %s\n", message); //prints "Hello_World!"

message[0] = '\0'; //now the "empty string"
  • What happens when a string is not properly null terminated?
  • You can get garbage results, segmentation faults, "undefined behavior"

Dynamic strings

  • You can use malloc to allocate enough space to hold whatever strings you want to represent
int n = 100;
char *name = (char *) malloc( (n+1) * sizeof(char));

name[0] = 'C';
name[1] = 'h';
name[2] = 'r';

//you CANNOT use:
name = "Christopher Bourke";
//memory leak! don't do it

String Library

  • There is a standard string library in string.h
  • Dozens of useful functions

Assignment/Copy Function

  • strcpy: takes two arguments, a "destination" and a "source"
int n = 100;
char *name = (char *) malloc( (n+1) * sizeof(char));

strcpy(name, "Christopher Bourke");
//now name contains "Christopher Bourke\0"
  • strcpy will copy the null-terminating character for us!
  • However, strcpy assumes that the destination is big enough to hold what you are trying to copy

String Length

  • strlen: it takes a string and returns an integer
  • It returns the number of valid character in the string (it does NOT include the null terminating character)
char message[] = "Hello World!";
int n = strlen(message); //12

Concatenation

  • strcat: takes two arguments, the destination and the source
  • It appends or "concatenates" the source to the END of the destination string
int n = 100;
char *fullName = (char *) malloc( (n+1) * sizeof(char));
lastName = "Bourke";
firstName = "Chris";

strcpy(fullName, lastName);
strcat(fullName, ", ");
strcat(fullName, firstName);
//the content of fullName is "Bourke, Chris"
  • strcat takes care of the null-terminator for us
  • Like strcpy, it assumes that the destination string is big enough to hold the entire

Length-limited Versions

  • strncpy and strncat: copy or concatenate at most $n$ bytes (characters)
char firstName = "Christopher";
char name[6];
strncat(name, firstName, 5);
//often, you may need to handle the null terminator yourself:
name[5] = '\0';
  • With both, they'll handle the null terminating character IF and only if it appears within the first $n$ characters of the source string
  • both copy at most $n$ characters: they'll stop when they see that first null-terminator
  • Using the referencing operator, you can also copy a "substring"
char fullName[] = "Christopher Michael Bourke";
char middleName[8];
//want to copy "Michael" into middleName
strncpy(middleName, &fullName[12], 7);
middleName[7] = '\0';
printf("middle name = %s\n", middleName);

String in Java

  • In Java, strings are full objects and defined by the class String
  • There is NO null-terminating character in Java
  • There is no dynamic memory management so just use strings however you want!
  • Strings are immutable
String message = "hello World!";
message = "Hello World!";
  • Consequently, any string method that "changes" the contents of a string, actually returns a NEW string
String s = "hello";
s.toUpperCase();
//no effect, s is still "hello";
String t = s.toUpperCase();
System.out.println(s);
System.out.println(t);
  • Strings are immutable, but you have mutable version called StringBuilder









Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment