How to look up a cell value of a multi-column csv file?

Question

Following up this answer, how can I have an optional argument of the command getValue that specifies the column in which I look up the cell value so that it has the following syntax?

getValue<macro name>%
[<optional column number with default of 2, and the counting starts with 1 at the very first column>]%
{<search phrase>}%
{<data of the csv file>}

For example, getValueSndParThrd[4]{Second Parameter}{mydata} (after tweaking the following code) should store 44 in SndParThrd.

Also, getValueSndParFrst{Second Parameter}{mydata} should store 3 in SndParFrst.

begin{filecontents*}{test.csv}
Third Parameter  , 7 , 9  , 
First Parameter  , 5 , 10 ,
Second Parameter , 3 , 6  , 44
end{filecontents*}

documentclass{article}
usepackage{xparse}

ExplSyntaxOn
% Step 1: reading the file
ior_new:N l__diaa_csv_ior
ior_new:N l__diaa_csv_str_bool
NewDocumentCommand ReadCSV { s m >{ SplitArgument {1} {,} }O{ 1,2 } m }
  {
    IfBooleanTF {#1}
      { bool_set_true:N l__diaa_csv_str_bool }
      { bool_set_false:N l__diaa_csv_str_bool }
    diaa_csv_read:nnnn {#2} #3 {#4}
  }
cs_new_protected:Npn diaa_csv_read:nnnn #1 #2 #3 #4
  {
    prop_new:c { g__diaa_csv_#1_prop }
    ior_open:NnTF l__diaa_csv_ior {#4}
      { __diaa_csv_read:cnn { g__diaa_csv_#1_prop } {#2} {#3} }
      { msg_error:nnn { diaa } { file-not-found } {#4} }
  }
msg_new:nnn { diaa } { file-not-found }
  { File~`#1'~not~found. }
cs_new_protected:Npn __diaa_csv_read:Nnn #1 #2 #3
  {
    bool_if:NTF l__diaa_csv_str_bool
      { ior_str_map_inline:Nn }
      { ior_map_inline:Nn }
        l__diaa_csv_ior
        {
          prop_put:Nxx #1
            { clist_item:nn {##1} {#2} }
            { clist_item:nn {##1} {#3} }
        }
  }
cs_generate_variant:Nn __diaa_csv_read:Nnn { c }
%
% Step 2: getting the values
NewDocumentCommand getValue { m m m }
  { tl_set:Nx #1 { diaa_csv_item:nn {#2} {#3} } }
NewExpandableDocumentCommand CSVItem { m m }
  { diaa_csv_item:nn {#1} {#2} }
cs_new:Npn diaa_csv_item:nn #1 #2
  { prop_item:cn { g__diaa_csv_#2_prop } {#1} }
ExplSyntaxOff

begin{document}

ReadCSV{mydata}{test.csv}

getValuerdPar{Third Parameter}{mydata}
rdPar

edefstPar{CSVItem{First Parameter}{mydata}}%
stPar

end{document}

csv xparse

frougon · Accepted Answer

This is a modified version of Phelype Oleinik's answer that you linked to in the question. The first method is shorter, the second is better. :-)
First method
Column numbers in the CSV file start at 1. The new commands have the following syntaxes:
ReadCSV[*] {label} [key column] {value columns} {file}

where key column defaults to 1 and value columns is a comma-separated list of column numbers (only values from these columns are stored by ReadCSV to allow later retrieval by getValue and CSVItem).
getValue {macro} [value column] {key} {label}

This is what you asked for in the question. value column defaults to 2. The contents of value column corresponding to the specified key is stored in macro (which could be an expl3 token list variable).
CSVItem {key} [value column] {label}

This command is fully expandable. It recursively expands to the contents of value column corresponding to the specified key. value column defaults to 2, as in getValue.
begin{filecontents*}{test.csv}
Third Parameter  , 7 , 9  ,
First Parameter  , 5 , 10 ,
Second Parameter , 3 , 6  , 44
end{filecontents*}

documentclass{article}
usepackage{xparse}

ExplSyntaxOn
% Step 1: reading the file
ior_new:N l__diaa_csv_ior
bool_new:N l__diaa_csv_str_bool
seq_new:N l__diaa_csv_tmp_seq

% str mode (bool/star), label, key column, value columns, file
NewDocumentCommand ReadCSV { s m O{1} m m }
  {
    IfBooleanTF {#1}
      { bool_set_true:N l__diaa_csv_str_bool }
      { bool_set_false:N l__diaa_csv_str_bool }
    diaa_csv_read:nnnn {#2} {#3} {#4} {#5}
  }

% label, key column, value columns, file
cs_new_protected:Npn diaa_csv_read:nnnn #1 #2 #3 #4
  {
    seq_set_split:Nnn l__diaa_csv_tmp_seq { , } {#3}
    seq_map_inline:Nn l__diaa_csv_tmp_seq
      { prop_new:c { g__diaa_csv_#1_##1_prop } }

ior_open:NnTF l__diaa_csv_ior {#4}
      {
        __diaa_csv_read:nn {#1} {#2}
        ior_close:N l__diaa_csv_ior
      }
      { msg_error:nnn { diaa } { file-not-found } {#4} }
  }

msg_new:nnn { diaa } { file-not-found }
  { File~`#1'~not~found. }

% label, key column
cs_new_protected:Npn __diaa_csv_read:nn #1 #2
  {
    bool_if:NTF l__diaa_csv_str_bool
      { ior_str_map_inline:Nn }
      { ior_map_inline:Nn }
        l__diaa_csv_ior
        {
           seq_map_inline:Nn l__diaa_csv_tmp_seq
             {
               prop_put:cxx { g__diaa_csv_#1_####1_prop }
                 { clist_item:nn {##1} {#2} }
                 { clist_item:nn {##1} {####1} }
             }
        }
  }

% Step 2: getting the values
% macro or tl var, value column, key, label
NewDocumentCommand getValue { m O{2} m m }
  { tl_set:Nx #1 { diaa_csv_item:nnn {#3} {#2} {#4} } }

% key, column containing the desired value, label
NewExpandableDocumentCommand CSVItem { m O{2} m }
  { diaa_csv_item:nnn {#1} {#2} {#3} }

% key, column containing the desired value, label
cs_new:Npn diaa_csv_item:nnn #1 #2 #3
  { prop_item:cn { g__diaa_csv_#3_#2_prop } {#1} }
ExplSyntaxOff

begin{document}

% Use default column for the key (1)
ReadCSV{mydata}{2,4}{test.csv} % discard the third column

getValuerdPar[4]{Second Parameter}{mydata}
rdPar                         % 44

getValuerdPar{Second Parameter}{mydata}
rdPar                         % 3

getValuerdPar{Third Parameter}{mydata}
rdPar                         % 7

edefrdPar{CSVItem{First Parameter}{mydata}}%
rdPar                         % 5

edefrdPar{CSVItem{First Parameter}[4]{mydata}}%
ifxrdParempty
  textlangle emptytextrangle
else
  rdPar
fi

end{document}

Second method
This is based on the previous code, but:

parsing of the CSV lines is now done with seq_set_split:Nnn instead of clist functions in order not to ignore empty cells;

it is more memory-efficient (only one l3prop is used to store all selected columns);

column addressing is different: when retrieving values with getValue or CSVItem, column 1 is the first selected column, column 2 is the second selected column, etc. Accordingly, the default column number when retrieving values with these functions is 1 (instead of 2).

getValue now has a star version that performs a global assignment.

Since you want autodetection of the number of columns, I had to change the syntax of ReadCSV as follows:
ReadCSV[*] [key column] {label} [value columns] {file}

key column defaults to 1 as above. However, when value columns is blank, the first line of file is read to determine the number n of columns, and value columns are defined to be all columns from 2 to n (both inclusive).
Spaces on both sides of column-delimiter commas are ignored when reading the file, and one set of outer braces around each cell contents is removed, if any (this is illustrated by the cell containing foo, bar in the sample CSV file below).
begin{filecontents*}{test.csv}
Third Parameter  , 7 , 9          ,
First Parameter  , 5 , {foo, bar} ,
Second Parameter , 3 , 6          , 44
end{filecontents*}

documentclass{article}
usepackage{xparse}

ExplSyntaxOn
% Step 1: reading the file
ior_new:N l__diaa_csv_ior
bool_new:N l__diaa_csv_str_bool
seq_new:N l__diaa_csv_tmp_seq

% str mode (bool/star), key column, label, value columns, file
NewDocumentCommand ReadCSV { s O{1} m O{} m }
  {
    IfBooleanTF {#1}
      { bool_set_true:N l__diaa_csv_str_bool }
      { bool_set_false:N l__diaa_csv_str_bool }
    diaa_csv_read:nnnn {#3} {#2} {#4} {#5}
  }

% label, key column, value columns, file
cs_new_protected:Npn diaa_csv_read:nnnn #1 #2 #3 #4
  {
    tl_if_blank:nTF {#3}       % Detect number of columns and use 2 to last
      {
        ior_open:NnTF l__diaa_csv_ior {#4}
          {
            bool_if:NTF l__diaa_csv_str_bool
              { ior_str_get:NN }
              { ior_get:NN }
              l__diaa_csv_ior l_tmpa_tl

ior_close:N l__diaa_csv_ior
            seq_set_split:NnV l_tmpa_seq { , } l_tmpa_tl
            seq_clear:N l__diaa_csv_tmp_seq
            int_step_inline:nnn { 2 } { seq_count:N l_tmpa_seq }
              { seq_put_right:Nn l__diaa_csv_tmp_seq {##1} }
          }
          { msg_error:nnn { diaa } { file-not-found } {#4} }
      }
      { seq_set_split:Nnn l__diaa_csv_tmp_seq { , } {#3} } % explicit columns

ior_open:NnTF l__diaa_csv_ior {#4}
      {
        prop_new:c { g__diaa_csv_#1_prop }
        __diaa_csv_read:nn {#1} {#2}
        ior_close:N l__diaa_csv_ior
      }
      { msg_error:nnn { diaa } { file-not-found } {#4} }
  }

msg_new:nnn { diaa } { file-not-found }
  { File~`#1'~not~found. }

cs_generate_variant:Nn prop_put:Nnn { cxV }

% label, key column
cs_new_protected:Npn __diaa_csv_read:nn #1 #2
  {
    bool_if:NTF l__diaa_csv_str_bool
      { ior_str_map_inline:Nn }
      { ior_map_inline:Nn }
        l__diaa_csv_ior
        {
          seq_set_split:Nnn l_tmpa_seq { , } {##1} % split one CSV row
          tl_clear:N l_tmpa_tl
          seq_map_inline:Nn l__diaa_csv_tmp_seq
            {
              tl_put_right:Nx l_tmpa_tl { { seq_item:Nn l_tmpa_seq {####1} } }
            }

prop_put:cxV { g__diaa_csv_#1_prop }
            { seq_item:Nn l_tmpa_seq {#2} }
            l_tmpa_tl
        }
  }

% Step 2: getting the values
% star → global assignment, macro or tl var, value column, key, label
NewDocumentCommand getValue { s m O{1} m m }
  {
    IfBooleanTF {#1} { tl_gset:Nx } { tl_set:Nx }
      #2 { diaa_csv_item:nnn {#4} {#3} {#5} }
  }

% key, value column, label
NewExpandableDocumentCommand CSVItem { m O{1} m }
  { diaa_csv_item:nnn {#1} {#2} {#3} }

cs_generate_variant:Nn tl_item:nn { f }

% key, value column, label
cs_new:Npn diaa_csv_item:nnn #1 #2 #3
  {
    tl_item:fn { prop_item:cn { g__diaa_csv_#3_prop } {#1} } {#2}
  }
ExplSyntaxOff

begin{document}

% Use default column for the key (1). The second empty optional argument (list
% of value columns) means we want to autodetect the value columns; then, the
% first column is for keys and all other columns are used as value columns.
ReadCSV{mydata}{test.csv}

getValuerdPar{Second Parameter}{mydata}
rdPar                         % 3

getValuerdPar[2]{Second Parameter}{mydata}
rdPar                         % 6

getValuerdPar[3]{Second Parameter}{mydata}
rdPar                         % 44

getValuerdPar{Third Parameter}{mydata}
rdPar                         % 7

edefrdPar{CSVItem{First Parameter}{mydata}}%
rdPar                         % 5

edefrdPar{CSVItem{First Parameter}[2]{mydata}}%
rdPar                         % foo, bar

edefrdPar{CSVItem{First Parameter}[3]{mydata}}%
ifxrdParempty
  textlangle emptytextrangle
else
  rdPar
fi

end{document}

Steven B. Segletes · Answer

With the readarray package...
Note, the leading/trailing spaces in the input data are retained, for good or ill.
begin{filecontents*}{test.csv}
Third Parameter  , 7 , 9  , 
First Parameter  , 5 , 10 ,
Second Parameter , 3 , 6  , 44
end{filecontents*}

documentclass{article}
usepackage{readarray,pgffor}
newcommandgetValue[3][2]{%
  deftmpA{#2}%
  foreachz in{1,...,nrows}{%
    arraytomacro#3[z,1]tmpB
    ifxtmpAtmpB #3[z,#1]fi
  }%
}

begin{document}
readarraysepchar{,}
readdef{test.csv}mydata
readarraymydatamyarray[-,ncols]
nrows{} rows in data

Cell(2,2): myarray[2,2]

Cell(3,4): myarray[3,4]

getValue{Third Parameter }{myarray}
and
getValue[4]{Second Parameter }{myarray}
end{document}

How to look up a cell value of a multi-column csv file?

2 Answers

First method

Second method

Add your own answers!

Ask a Question