TransWikia.com

Parse a simple key=value config file in C

Code Review Asked by Sedmaister on November 28, 2021

Write a program that should read a file formatted strictly like
this below:

$ cat sample

bingo=2
bingo2=939
bingo3=text

The left and right side of = are consecutively parameter name and
its respective value. The values can be integer or strings. Each parameter is
separated by a newline. The parameter name and its value will not be more than
127 bytes.

This program is fed an input search token, and if found, parse_param_file() will fill the input arguments with string or integer value.

#include <stdio.h>
#include <string.h>
#include <stdlib.h> 

#define TEXT "sample"

/* 
 * Function: parse_param_file
 * Arguments:
 * token    - param name that needs to be looked up
 * buf      - buffer for string value. Pass NULL if unwanted
 * buf_int  - buffer for integer value. Pass NULL if unwanted
 * filename - string formatted path to the file 
 */
int parse_param_file(const char *token, char **buf, int *buf_int,
        const char *filename) {
    if (!token || (!buf && !buf_int)) {
        printf("%s: invalid argsn", __FUNCTION__);
        return 1;
    }
    FILE *f = fopen(filename, "r");
    if (f == NULL) {
        printf("%s: %s not foundn", __FUNCTION__, filename);
        return 1;
    }
    char *left = malloc(128);
    char *right = malloc(128);
    char *dlim_ptr, *end_ptr;
    char fbuf[256];
    while (fgets(fbuf, 256, f)) {
        dlim_ptr = strstr(fbuf, "=");
        end_ptr = strstr(dlim_ptr, "n");
        strncpy(left, fbuf, dlim_ptr - fbuf);
        strncpy(right, dlim_ptr + 1, end_ptr - dlim_ptr - 1);
        if (strcmp(left, token) == 0) {
            /* Param found */
            if (buf != NULL)
                *buf = strdup(right);
            if (buf_int != NULL)
                *buf_int = atoi(right);
            fclose(f);
            free(left);
            free(right);
            return 0;
        }
    }
    fclose(f);
    printf("%s: Param not found in %sn", __FUNCTION__, filename);
    free(left);
    free(right);
    return 1;
}

/* How to use parse_param_file() */
int main(void) {
    char *buf = malloc(128);
    int val = 0;

    /* Test 1 */
    if (parse_param_file("bingo2", NULL, NULL, TEXT))
        fprintf(stderr, "parse_param_file() failedn");
    printf("buf : %sn", buf);
    printf("int : %dn", val);
    /* Test 2 */
    if (parse_param_file("bingo3242", NULL, &val, TEXT))
        fprintf(stderr, "parse_param_file() failedn");
    printf("buf : %sn", buf);
    printf("int : %dn", val);
    /* You must clear *buf and val if you want to reuse them as args again */
    /* Test 3 */
    if (parse_param_file("bingo3", &buf, &val, TEXT))
        fprintf(stderr, "parse_param_file() failedn");
    printf("buf : %sn", buf);
    printf("int : %dn", val);

    free(buf);
    return 0;
}

6 Answers

Undefined behavior

strcmp(left, token), strdup(right) are UB as left, right do not certainly point to strings (null character terminated arrays).

Could replace

strncpy(left, fbuf, dlim_ptr - fbuf);

with below to insure null termination.

sprintf(left, "%.*s", (int) (dlim_ptr - fbuf), fbuf);

Answered by chux - Reinstate Monica on November 28, 2021

Ok, first some comments. I’ll add code if you want - just can’t do it effectively from my phone.

  1. Basically any string function that doesn’t take the length of the string is problematic. It has the potential to read beyond the end of the buffer. atoi should be replaced with strtol, strcmp with strncmp, ...

    1. Allocating fixed sizes of memory, just to release the fixed sizes of memory is pointless. Don’t do that. Now having the allocation and the free in the same function is very beneficial, do that over allocating something buried in a function and remembering to deallocate it later.
  2. The standard string library is very slow. It is ok for very simple things, as long as you only need to do them once. If you need to do this faster, you might want to use a tool like lex or flex to do so. You can also write your own lexical scanner based on deterministic automata - it’s a good exercise, but flex is much easier.

Ok, I’ll have to fix my formatting later.

Answered by Robert Baron on November 28, 2021

Well I don't know exactly what strstr does, but I would've used strtok (string tokenizer) somewhere in the process as its purpose is to find delimiters and split strings there.

Answered by der bender on November 28, 2021

__func__

Unlike __FUNCTION__, __func__ is standard C since C99. Some compilers might not support C99 and therefore in those compilers you have to use whatever they provide, but if your compiler supports C99, you should use __func__. If you have a crappy compiler, think about using a better one.


magic numbers

What is a magic number, and why is it bad?

Don't use any numbers different than 0, 1, or 2 in your code. The only place where numbers deserve to go is in constant macros like this:

#define MAX_STRLEN   (127)

Note: As Roland Illig pointed out in his comment, this is still a magic number, but it is already much better than the alternative. To improve it, a comment about why 127 and not another number, would be very appropriate just above the definition.


BUFSIZ

When you create a buffer, and you don't need a strict size (when you need a very large buffer, or you are short of memory), you can use BUFSIZ which is defined in <stdio.h>, which is around 1KiB~4KiB.

#include <stdio.h>

...

char fbuf[BUFSIZ];

ARRAY_SIZE()

Use this macro wherever possible.

Typically defined as: #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))

It makes your code safer:

If you ever change the size of an array, it will be updated, while if you hardcoded the size as a magic number, you will have to update every number, and if you hardcoded it as a macro (sometimes, ARRAY_SIZE can't be used, and this is the only possibility), if you change the name of the macro, you will have to update it.

#include <stdio.h>

#define ARRAY_SIZE(arr)     (sizeof(arr) / sizeof((arr)[0]))

...

char fbuf[BUFSIZ];
while (fgets(fbuf, ARRAY_SIZE(fbuf), f)) {

Note: Never use sizeof where ARRAY_SIZE should be used. It's very unsafe, because if an array ever changes to be a pointer (for example if it is a function parameter), ARRAY_SIZE will raise a warning on recent compilers, while sizeof will give wrong results, and produce hard to find bugs.

Answered by alx on November 28, 2021

Your program crashes when it reads a line from the parameters file that doesn't contain an = sign.

The name of the macro TEXT is misleading. It doesn't contain a text but a filename.

Whenever you output an error message, it belongs on stderr instead of stdout. To do this, replace printf( with fprintf(stderr,.

Answered by Roland Illig on November 28, 2021

Interface

I'm not overly fond of the interface you've defined to the function. I think trying to combine reading integers and reading strings into a single function makes it more difficult to use. For most C code, I think the old guiding principle of UNIX ("do one thing, and do it well") provides excellent guidance. As such, I'd probably have two separate functions, one for reading a string, the other for reading a number.

I haven't tested it to be sure, but at first glance it also looks like if you ask to read a number, it does a poor job of signalling failure. For example, if your file has: foo=0 and you try to read foo as a number, it'll set the number to 0, and return 0. But if the file contains foo=Joe instead, it looks to me like it'll do exactly the same.

I'd prefer that if it couldn't convert the rest of the line after the = to a number that it return 1 to indicate that it has failed.

Memory Allocation

Since you only use our left and right inside of your function, and free them again before returning, there's probably no need to allocate them on the heap. You can just allocate them as local arrays, and use them. The obvious exception to this would be if you were targeting a tiny micro-controller that might well have less than 256 bytes available on the stack. In that case, however, you usually won't have a file system either, so the entire function becomes irrelevant.

Standard Library

I think judicious use of fscanf can simplify the code quite a bit.

For reading a string, it seems like the obvious interface would be to just return the value part of the pair, and if it's not found, return a NULL pointer:

char *read_string(FILE *file, char const *desired_name) { 
    char name[128];
    char val[128];

    while (fscanf(file, "%127[^=]=%127[^n]%*c", name, val) == 2) {
        if (0 == strcmp(name, desired_name)) {
            return strdup(val);
        }
    }
    return NULL;
}

When reading a number, we probably need something closer to your original interface, with a return value to indicate success/failure, and receive a pointer to an int where we write the value (if it's found):

int read_int(FILE *file, char const *desired_name, int *ret) {
    char *temp = read_string(file, desired_name);

    char *stop;
    *ret = strtol(temp, &stop, 10);
    int ret_val = stop == NULL || *stop != '';
    free(temp);
    return ret_val;
}

Note that as it stands right now, the return value reflects converting the entire remainder of the line (everything after the =) to the desired number. If part of it can be converted, that part will be written to the destination, but the return will still indicate failure. For example, if you had foo=1234abcd, attempting to read foo as a number would give you 1234, but return 1 because the abcd part couldn't be converted.

It would also be possible to simply ignore any trailing garbage, so foo=1234abcd would read the 1234 part, and return 0 to signal success:

int read_int(FILE *file, char const *desired_name, int *ret) {
    char *temp = read_string(file, desired_name);

    char *stop;
    *ret = strtol(temp, &stop, 10);
    int ret_val = stop == temp;
    free(temp);
    return ret_val;
}

In this case, we just need to check that at least one character was converted, in which case stop will have be advanced at least one character after the input.

Answered by Jerry Coffin on November 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP