English Language & Usage Asked by philshem on August 29, 2021
I recently put together some Google n-grams for a short piece on the transition of the word data into a singular (mass) noun: Data are Beautiful: Data’s story in grammar.
There was one peculiar finding:
When starting a sentence, the trend reverses. "The data are" and "Data are" are approximately twice as common as "The data is" and "Data is". (n-gram)
The best suggestion for why, is that the writer is more careful at the beginning of the sentence (reddit comments).
Are there any actual grammatical reasons? Or, do you have any other guesses as to why the trend reverses?
If so, I’ll add them to the story – and link back to here.
meta:* I’m not asking about the difference between data is and data are, so this question is not related to the earlier ones on ELU.
Here is an Ngram of sentence-beginning "The data is" (yellow) and "The data are" (green) versus later-in-the-sentence "the data is" (red) and "the data are" (blue):
As you note in your article, the sentence-beginning versions of these phrases show less of an inclination toward "data is" than the later-in-sentence versions do.
But consider this Ngram of sentence-beginning "The data shows" (yellow) and "The data show" (green) versus later-in-the-sentence "the data shows" (red) and "the data show" (blue):
Here the greater preference for the plural form at the beginning of a sentence versus elsewhere in the sentence is again evident, but more striking is the preference for the plural over the singular regardless of where the phrase falls in a sentence. Another interesting feature of this Ngram is that, for most of the years reported, sentence-beginning "The data show/shows" is slightly more common than later-in-the-sentence "the data show/shows"—a much different result than with "The data are/is" versus "the data are/is." I have no idea why this is so.
These results suggest that factors other than position in a sentence can have a powerful effect on the popularity of plural versus singular forms of data. In view of that, I would be hesitant to reach a broad conclusion about the overall impact of position in a sentence in such preferences.
A final caution involves Ngram Viewer charts in general: They are unreliable in various ways, starting with the OCR program's not infrequent misreading of publication dates and search strings, and the search results feature's variation in reported results depending on the time frame selected. They are pretty to look at, though.
Correct answer by Sven Yargs on August 29, 2021
Data is the plural form of datum. If you have one point of data, it is called a datum. Like stratus and strata or errors and errata.
Now, with regard to saying data are vs. data is, that's just like UK vs US English; in the UK, people might say "BBC are reporting [...]" in reference to the fact that BBC contains many individuals who are reporting. In the US, however, people might say "BBC is reporting [...]" because BBC is a singular company and the report is being made by someone acting as a representative of the singular company. So a UK person would claim that the phrase "the data is ..." is simply incorrect.
It's a little more easy to notice when you say "The data show that [...]"; in either subset of english, it would be incorrect to say "the data shows [...]" because data is plural".
Answered by TylerH on August 29, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP