TransWikia.com

Is "data" treated as singular or plural in formal contexts?

English Language & Usage Asked on February 28, 2021

My non-native English speaking friend just asked me: “Data is…” or “Data are…”?

I said both but that’s because I’ve been desensitized from reading/writing both (especially from writing code and adding quick comments).

My question: Is it acceptable to utilize either for a university paper? Or is one safer than the other (when confronted with stickler professors)?


Related questions:

12 Answers

I have actually considered this quite a bit, being both a linguist who studies these things, and a scholar who publishes papers.

Etymologically speaking, the word data is the plural of datum in Latin. In Latin, data would get plural verb agreement. Now, languages borrow words and do whatever they want with them, so this historical fact about data has no relevance in judging what is "correct" in English. There is significant evidence that data has established itself as a mass noun in English, suggesting that, for most people, "data is" is the most natural way to speak.

However, in a university/scholarly paper, I would recommend using "data are", rather than "data is".

The reason: some stickler professors and pedantic scholars believe that, logically, if datum is an English word for a single piece of data (which it is), that data must logically be plural. The fact that most people do things differently only means, to them, that most people are doing it wrong. Whether you agree with that or not is somewhat irrelevant.

So you have two choices.

  1. If you use "data is", then reasonable people (yes, I am biased) who read your paper will not bat an eye, but stickler professors might judge you on your perceived ignorance or inappropriate level of informality.

  2. If you use "data are", then the stickler professors will not judge you to be ignorant, and the reasonable people will think "that's an acceptable variant" or "this person is a stickler for language" (or if they are me, will think "this person is pandering to the sticklers — a necessary evil"), but nobody will think you are ignorant.

So, choosing (2), "data are" is clearly your safest bet, and is what I always do (and what I find nearly all of my colleagues do).

Correct answer by Kosmonaut on February 28, 2021

As addressed in the question linked, it depends if you use the uncountable noun, meaning "a collection of data", or the plural form of datum. If it is the former, then the verb would be singular, otherwise it would be plural.

Now I would say, that in most university papers, you would use the uncountable singular form. The exception would be when data would describe an ensemble of measurements or when data is used in philosophy paper. (According to Wiktionary's definition.)

Answered by Eldroß on February 28, 2021

This is intended as a clarification of the "correctness" of using data as a mass noun, for those strict-minded sticklers (there's plenty of them) who might be unconvinced by Kosmonaut's "languages borrow words and do whatever they want with them":

1 - "Datum" and "data (plural)" are historically correct, so "data (mass noun)" must be wrong. How can "data" have a mass noun form as well as a singular and plural? You'd never say "Oh, I spilled rice on the floor. Wait, it's okay, I only spilled 4 rices". There's a separate noun phrase for the singular and plural ("grains of rice").

Consider potato. It has a singular form, meaning one distinct root vegetable, a plural form, meaning multiple distinct root vegetables, and a mass form, meaning an amount of foodstuff made from potatoes. Imagine a dinner table, where each diner has a baked potato on their plate (singular), and everyone is sharing a platter of roast potatoes (plural) and a bowl of mashed potato (mass) (hopefully among other things...). If you ask someone to "pass the potato", they'll understand that you mean the bowl of mass mash, not the tray of plural potatoes or the singular potato on their plate.

2 - There can be such a thing as "a datum" in a way which is not true for "a water". Imagine someone looking at a database full of data and saying, "There is so much data in this, I can't see where to start". Surely this is like standing in a migration of birds and saying "There is so much bird in the sky, I can't see the sun..."? Since data can be countable, surely "data" can't be primarily a mass noun?

Data is not necessarily countable. Data in a neat Excel sheet might have countable cells, but what about the data that is lost when photo editors talk about "data loss" when increasing the contrast of a digital photo made of binary machine code data? There's no clear way of defining where one datum starts and the next one stops — would a datum in this context be a bit, a byte, or the data defining one pixel? Such a line would be arbitrary, like looking for units of rice in a processed flat rice cracker. It's an amount measured in units of mass — 67kb of data in a jpg, 2 grams of rice in a rice cracker.

Even seemingly trivial cases aren't so trivial. What's one datum in a modern relational database? One value, one row? What about where there are table joins and foreign keys? Is a structural definition a datum? You can create a convention-specific definition, but it's not a universal definition like one bird.

3 - Following that pattern, shouldn't the mass noun of data be datum (the singular), like how the mass noun of potatoes is potato?

No. It's rare, but not completely unique, for a count noun to develop from a plural, in cases where the singular over time becomes less and less universally meaningful. "Physics" used to mean the set of countable, defined, distinct natural sciences - until the field developed such that it became clear that the lines between one physic and another wasn't as sharp or universal as previously thought.

You could answer "What's happening at CERN?" with "A lot of physics", but you wouldn't expect the reply "How many?". This is because there's no longer a clear established universal dividing line between one physic and another. Your answer would interpret the question as, "How much?" and would be a measurement of amount: "Enough to occupy 4,000 physicists". In the same way, you could answer "What does this supercomputer store?" with "A lot of data", but the reply "How many?" would incorrectly assume that all data has one clear common countable unit and that there is a clear universal dividing line between one datum and another across all contexts. Even if this data did happen to have a consistent countable convention, replying "7 million data" would be ambiguous unless the asker already knew this convention. A more useful answer would be to interpret it as "How much?" and give an answer in terms of a measurement of amount: "Nearly 220 petabytes".

Answered by user56reinstatemonica8 on February 28, 2021

AHDEL has, for news: pl.n. (used with a sing. verb).

Collins has n (functioning as singular), not claiming it as a plural noun, though it gives both singular and plural near-synonyms in its definitions of polysemes:

  1. current events; important or interesting recent happenings

  2. information about such events, as in the mass media

Surely the way the noun is (in this case universally) used rather than its form (or its origin, from a plural noun in Middle English newes, meaning new things) decides correct usage. Tidings is treated differently.

However, data is treated as requiring singular concord by some authorities and plural concord by others - as stated in previous answers. (Amusingly, in this case Collins is slightly more prescriptive than the AHDEL!) I believe that normal non-academic usage strongly advocates singular concord, while different universities still hold different opinions. Because there are no ex-cathedra (in the absolute sense) rulings in these areas, a university must give its own preferences in an in-house style guide (as many do) and be prepared to tolerate opposing usages from other equally entitled institutions. Students should make sure that, in submitted work, they follow the style guide of the university that will ultimately pass or fail them.

Answered by Edwin Ashworth on February 28, 2021

I'm a strong proponent of data as a mass noun, taking the singular in grammatical usage ("the data shows us something"). Use of data as a plural ("the data show us...") seems pretentious and pedantic, as if to make a show of your knowledge that in Latin, data is a plural form of datum.

I have several reasons for being stubborn about data as a mass noun:

  1. Datum is a reference line in a mechanical drawing. More than one of these may be called data, if you must show off your knowledge of Latin, but I think in this case they'll usually be referred to as datum lines.

  2. If you can tell me how many data you have, then I will use plural verbs to refer to your data, but as long as you need quantitative units to tell me the size of your data set, then I will call it a collective singular: The data is too big to load into memory. My storage holds up to 1 TB of data. There are not 1 trillion data in there, however.

  3. No data point can stand on its own, but rather it derives meaning and significance from its context. What were the conditions of its measurement? What were the other measurements? Etc. It doesn't make semantic sense to refer to a single datum unless it has that specific meaning, as a reference point or baseline. What we mean by data as a plural is semantically different from what we mean by data as a collective singular.

Answered by Tim D on February 28, 2021

Resurrecting this, I just answered a dupe of this, so here was my answer there:

Essentially, this comes down to "It's plural if you want it to be." I never liked that answer, either.

However, people really like arguing about this.

Etymologically, data comes from Latin. This is well-known. Unfortunately, in Latin, its plurality was defined by devices that exist in English only in a far lesser capacity: gender and noun case.

In the Latin nominative case, data could be either the neuter plural or the feminine singular of datum. The neuter singular was datum, the masculine singular datus, the feminine plural datae, and the masculine plural dati.

Use of data as a plural in English (the earlier form) comes from a suggestion that we should incorporate the words closest in Latin meaning to how they will be used in English: the neuter singular datum and the neuter plural data.

However, data could also function as the feminine singular in Latin, which I conjecture led to its commonplace use as a mass noun in English.

I enjoy using these words as they were used in Latin: in a survey of male students, I might say "After the dati were collected, each outlying datus was removed." In a survey of female students, I might say "After the datae were collected, each outlying data was removed." In a survey of pineapples, I might say "After the data were collected, each outlying datum was removed."

Most people do not enjoy this. The first two usages are not by any means commonplace (possibly even unattested outside random tangents on the internet), with the third occasionally seen as archaïc but often accepted or even preferred, with data used as plural.

It is more common today, however, to use data as a mass noun; that is, "the data was collected," not "each data was collected." Datum remains typical in the latter case.

Answered by Tortoise on February 28, 2021

"a bowl of mashed potato" This is very strange, at least in US useage. In my 67 years I don't think I have ever heard anyone say other than "a bowl of mashed potatoes," and I've lived from North Caroilina to Indiana to New York to Connecticut to Michigan to Louisiana and back to North Carolina by way of Tennessee. A mashed potato is possible, but only if one mashes but one potato. Data works well as a mass/common noun with a singular verb as long as the data is of a uniform nature (concerning only per capita income). If the data is varied (concerning both per capita income and books purchased per unit of time), then it is not a mass/common noun and would do well to take a plural verb. It would only seem reasonable to vary datum's declension according to sex if it were used as an adjective. In the examples above, it is a noun. In those examples, only the neuter datum/data, "a thing/the things given," would seem appropriate. In any case, one should note that the Latin 2nd declension masculine plural ending "i" was pronounced much as the English "ee," and the Latin 1st declension feminine plural "ae" was pronounced similar to the English "eye," which rather confuses both the point and any listeners who might care to understand why one would choose to say such things.

Answered by Christopher C Tew on February 28, 2021

I would regard "the data is consistent" and "the data are consistent" as having slightly different meanings. The former carries an implication that some particular (possibly large) set of information was completely examined and found to be consistent. The latter carries an implication that multiple independent pieces of information were examined, but does not particularly imply that any particular identifiable set of information was examined completely. Neither implication is absolute, in but some situations I would suggest the singular form as more appropriate, and in others I would suggest the plural.

Answered by supercat on February 28, 2021

Longman Dictionary of Contemporary English says:

After data, you can use a singular verb or, in formal or technical English, a plural verb.

Example: The data is collected by trained interviewers. These data are summarized in Table 5.

Do not say 'datas’ or 'a data'.

And based on Oxford:

Data is used as a plural noun in English while the sigular is datum.

Answered by M.N on February 28, 2021

Technically plural; your instincts are correct. But colloquial usage probably allows for either at this late date in many dictionaries, I would guess.

Answered by Leland jacobus on February 28, 2021

It's a great example of a word in transition.

"Traditionally" it was the plural form of datum.

The fact is though, more and more "authorities" are using it as a singular.

"The Oxford English Dictionary defines it like this:

In Latin, data is the plural of datum and, historically and in specialized scientific fields , it is also treated as a plural in English, taking a plural verb, as in the data were collected and classified. In modern non-scientific use, however , despite the complaints of traditionalists, it is often not treated as a plural. Instead, it is treated as a mass noun, similar to a word like information, which cannot normally have a plural and takes a singular verb. Sentences such as data was (as well as data were) collected over a number of years are now widely accepted in standard English."

In contrast to that:

"The official view from the Office for National Statistics takes the traditional approach. The ONS style guide for those writing official statistics says:

The word data is a plural noun so write "data are". Datum is the singular."

http://www.theguardian.com/news/datablog/2010/jul/16/data-plural-singular

It's worth remembering the priceless words from an introduction to the OED: "This book is descriptive, not prescriptive."

Once again, "data" is a great example of a word in transition. A reminder that with questions of spelling and grammar, the concept of what's "right" is a difficult one. All truth is social, and all the more so with language correctness.

enter image description here

Answered by Fattie on February 28, 2021

According to the IEEE Style Guide (archive link here), under "Some Common Mistakes":

The word “data” is plural, not singular.

So if your formal context is an IEEE venue, the answer is authoritatively "plural."

Answered by WBT on February 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP