A Corpus-based Study of English Vocabulary in Art Research Articles

The learning of English as a foreign language is an additional burden for art majors. This study aimed to examine high frequency words in art research articles to improve the efficiency of art majors’ English learning, especially their academic reading and writing. For this aim, the study built a corpus, analyzed data from art research articles and compared data with three base word lists. We found that the General Service List (GSL) and the Academic Word List (AWL) had a high coverage in our corpus, and there was a different high frequency word order in the Art Research Article Corpus (ARAC). These findings provide some implications for teaching English for art majors.


Introduction
In China, art majors have much more difficulty in the acquisition of English as a foreign or second language.For them, there is a specialized entrance examination of art colleges with the emphasis on candidates' professional skills.And for a better score in the entrance examination, art students pay more attention to develop their enterprise in art.Their English language proficiency is comparatively lower than other majors (Zhang et al., 2008;Wang, 2012;Li, 2014;Yuan & Wang, 2015).But, once they are admitted to art colleges, they have to take the common core course, i.e.English communication.Thus, it is a challenge for English educators to teach English communication in art colleges.How to teach the course effectively is not only essential for students to learn about foreign culture, communicate in master classes delivered by foreign artists, read research articles in international journals, but also strategic for the whole country to build a strong cultural nation, realize cultural renaissance, and involve in more international culture interactions.
Many studies (e.g.Zhang, 2010;Li, 2012) indicate that the mainstream English pedagogy is still the traditional grammar-translation approach and the class is still teacher-centered in Chinese art colleges.In the author's college, English teachers use the common textbooks, which is similar to those of other colleges or majors.Many students show little interest in English communication class, partly due to the old-fashioned learning materials, and partly because of the unmatched and unsuitable issue between the textbooks and art majors' current English level.Liu (2010) maintains art majors' small vocabulary size is the primary cause of their lower English proficiency.And she also states that due to the inadequate vocabulary, art majors cannot understand some simple texts and articles, let alone speak English, listen to English and write in English.These statements echo what Hu and Nation (2000) note: vocabulary size predicts how much can be understood.This means vocabulary is essential in the acquisition of English, and the bigger a reader's vocabulary, the better the reader could understand the reading materials.
Lexicology researchers find that there is a special need and learning focus for different majors at different learning stages (Coxhead, 2000;Nation, 2001).The vocabulary need varies in different disciplines.For instance, Ward (2009) created a basic engineering English word list and suggested that there should be a priority of lexis learning for less proficient learners.Valipouri and Nassaji (2013) analyzed a corpus of chemistry research articles and suggested that students should learn vocabulary according to their discipline-specific needs.However, few scholars established an art English word list and examined the lexical need of art major to read research articles.This study aims to study English vocabulary in art research articles, examine the coverage of the GSL and the AWL words in our corpus, and help them to master the needed vocabulary to understand art research articles better for further study and involvement in the Eastern and Western culture communication.

Lexical threshold necessary for reading comprehension
Vocabulary may be a good predictor of reading comprehension (Laufer, 1992;Nation, 2006).If a reader has a bigger vocabulary size, the reader can understand better the reading materials.According to Alderson (1984), there is a lexical threshold below which readers cannot employ their reading strategies to read L2.Thus, it is vital for researchers and educators to know the lexical threshold for various reading comprehension.
Researchers (e.g.Laufer, 1989;Nation & Waring, 1997;Hu & Nation, 2000) have investigated the relation between L2 vocabulary knowledge and reading comprehension.They claimed that readers needed to know a certain percentage of the running words in a text for better comprehension.For instance, Laufer (1989) stated that 95% of running words were needed for reasonably understanding a text.This indicts that if a reader's lexical threshold is below the 95%, the reader could not read the text effectively without any help.To read for pleasure, Hirsh and Nation (1992) suggested learners master 98% of the running words in a text.More studies recommended that 95% of the running words be necessary for minimal comprehension with 4000-5000 word families and 98% be great for optimal comprehension with 8000 word families (Nation & Waring, 1997;Nation, 2006;Laufer and Ravenhorst-Kalovski, 2010).

Previous studies on word lists
Investigations of word lists and vocabulary are abundant due to computer science and internet technology.Michael West (1953) established the General Service List (GSL) which included the most frequent 2000 words.Studies showed that the GSL occupied a large proportion in various reading materials.For example, Sutarsyah et al. (1994) found the coverage of the GSL was 82.5% in English for Special Purposes (ESP) corpus and 78.4% in English for Academic Purposes (EAP) corpus.Billuroglu and Neufeld (2005) got the similar finding that the GSL accounted for about 80% of running words in any written text.For the beginners of English learning and less inefficient learners, it is generally recommended that the GSL should be paid more attention for its high frequency occurrence in written texts.
Further studies showed that the GSL is not sufficient for intermediate and advanced learners who have more reading tasks, especially academic reading.Coxhead and Nation (2001) stated that learners should focus on academic vocabulary when they master the GSL words.Coxhead (2000) developed her Academic Word List (AWL) with 570 word families (i.e.headwords and their inflectional and derivational family members), with 10% of the running words in her corpus.This indicts that if a reader masters the AWL, the reader can understand 10% of their academic materials, such as textbooks or research articles.Since it was formed on the basis of a wide range of corpus, the AWL is widely cited and cross-disciplinary.There are Medical Word lists (Wang et al., 2008;Hsu, 2013), Engineering English Word Lists (Ward, 2009;Hsu, 2014), Agriculture Word List (Martinez et al., 2009), and Chemistry Academic Word List (Valipouri and Nassajo, 2013).However, there are few studies on word lists for art majors, especially for them to understand research articles in their discipline.

1.3
The purpose of the present study

Data processing and analysis
All the downloaded samples were originally in PDF formats and were converted into text forms (i.e.*txt) to be processed by the computer software program Range, which can be downloaded for free at https://www.victoria.ac.nz/lals/about/staff/paul-nation.Range is a program that "can be used to find the coverage of a text by certain word lists, to create word lists based on frequency and range, and to discover shared and unique vocabulary in several pieces of writing" (Nation, 2005, p.2).We used Range for lexical analysis for it consists of three base lists: the first and second 1000 most frequent words in the GSL, and the AWL Words, so that we could examine the coverage of the GSL and the AWL in our corpus.For the standardization of the research articles in the corpus, figures, tables, notes, acknowledgments, bibliographies, references and appendices in the articles were removed to be readable by the software.For the normalization of the words, a headword and all its inflections and derivations were counted as a word family.For instance, a word achieve and other forms such as achieves, achieved, achievable, achievement and achievements were counted as one word by the software.If they know a headword, learners will master its family members much easier (Coxhead, 2000).In this study, a word family means a word token.
To analyze the data, we first determined the frequency and distributions of word tokens and types in the ARAC.The data was processed by the software Range, and Range could show the coverage of the words in the ARAC.Then, we examined the words which were not on the lists of the GSL and the AWL but occurred with high frequency, and analyzed the usage and coverage of those words.We followed Coxhead's (2000) procedure, so we only included the words which occurred in the four fields of sub-corpora and at least 5 times in each discipline.

The coverage of base word lists in the art research article corpus
In the corpus, there were 1 950 693 running words, 78457 word types and 4112 word families.A total number of 2287 word families were met the word selection criterion (≥20 times occurrence).Table 1 shows the total coverage of the base word lists in ARAC.The first 1000 frequent words account for 73.21% of lexical coverage and had 593 word families in the Art Research Articles Corpus.The second 1000 frequent words had coverage of 6.74% with 412 word families in the ARAC.In total, the first 2000 GSL words were counted in 79.95% coverage.The results show that mastery of the top 2000 GSL words had a significant meaning in reading and writing art research articles.If a learner has the mastery of 1005 word families, he or she could understand almost 80% of the running words in art research articles.For accumulative coverage point, learners should learn more words.Word list Three is the AWL, which accounted for 10.57% of lexical coverage in the ARAC with 485 word families.This percentage was higher than the average 10% (Coxhead, 2000) and that of Chemistry Academic Word's 9.96% (Valipouri & Nassaji, 2013).According to Wang et al. (2008), the AWL accounted for 12.24% of lexical coverage in their medical academic corpus, which was much higher than that of coverage in the ARAC.If the learner knows the meaning and usage of the AWL, his or her vocabulary would be expanded by a wide margin (i.e.10% on average).In addition to the former 79.95%, the learner would have 90.52% of lexical coverage in the ARAC.The results indicate when mastering the 485 word families, the learner could understand 90.52% of the running words of art research articles after the training of reading strategies.
In order to have a more lexical coverage, it is vital for art majors to learn more proper nouns, because they covered 2.35% with 234 headwords in the ARAC.Many proper nouns have very high frequencies in the corpus such words as countries for their culture heritage and art achievements, like China (326 times), America (673 times), Italy (267 times), and popular themes or times like Islam (274 times), Catholic (310 times), and Renaissance (410 times).Additionally, learners' vocabulary size would cover 92.87% of lexical coverage in the ARAC.
From Table 1, we could figure out there were words which were not in the lists and we referred to these words as non-GSL and non-AWL words.They accounted for 4.14% of lexical coverage in the ARAC corpus.It is essential for art majors to know more about these words for better reading comprehension.If they know these non-GSL and non-AWL words, they could reach 97.01% of lexical coverage in reading art research articles, and the percentage is more than that of 95% for a reasonable understanding.
We identified 2242 word families with high frequency, covering 95% of lexical coverage in the ARAC.This means that art majors need to master these 2242 word families to understand art research articles.

3.2
Comparison of the GSL and the AWL in the ARAC We compared the GSL words in the ARAC with ones in the GSL itself.Both of the lists of the top 30 words are the function words, such as the, to, that, with, this and have.There are some exceptions.One is the word which in the top 30 list in the ARAC with 6524 occurrences while it was listed 40 th with 2286 times in the GSL.That is because our corpus consisted of research articles which were more academic texts with more complicated and longer sentences, and the word which is high frequent one to be used to modify antecedents.We removed the function words which are considered too general, such as articles (a, an and the), some prepositions (in, on, at, of and to), pronouns (such as it, I, we, and they), and numbers (one, two, three, etc).We examined top 10 words as showed in Table 2.There was only the word which with high frequency occurrences in both lists, while other words were totally different.It is very urgent for art majors to know high frequency words in art research articles for efficient vocabulary acquisition.
Beside the GSL words, we compared top 10 academic words in the ARAC and in the AWL.Table 3 shows the top 10 words in the AWL on the right side, and these words have highest frequencies in a cross-disciplinary corpus with humanities and sciences texts (Brown et al., 2013).The words on the left side of Table 3 are the highest frequency ones in the ARAC.All the words on the both sides were quite different from each other.In total, we identified 485 academic word families with high frequency in our corpus.We compared our words with AWL, and found that many of the AWL words did not occur, or had low frequency in our corpus.Even the high frequency AWL words in the ARAC have different frequency orders.This indicts that it is necessary for educators and researchers to establish discipline-specific vocabulary lists so that students could master to read and write in their field (Hyland & Tse, 2007;Wang et al., 2008).

Conclusion
The study was conducted on the basis of the art research article corpus.We built the art research article corpus by collecting articles from sixteen journals.We used the software Range to analyze data, and compared word frequencies with the three base word lists.We identified 2242 word families with 95% of lexical coverage in the ARAC.We also figured out there are overlaps and differences among the GSL, the AWL and our corpus.These findings offered implications for language educators and material designers to establish discipline-specific word lists and students will benefit a lot from specific word list and acquire the words efficiently.
The study is only a preliminary one on the vocabulary of art research articles.Due to the accessibility of data downloaded from the internet, we only collected data from four main sub-disciplines of art, so the corpus is comparatively small and confined to research articles.For further study, a larger corpus of more art research articles and art textbooks would be significant.These efforts will provide more implications for language teachers, especially for those in the ESP field, and students wound benefit much from the specific word list for better reading and writing in their fields.and Sports Majors: From the Perspective of Connectionism.Heilongjiang Researches on Higher Education, 251 (3), 174-176. Zhang, L. (2010).Strategies and Survey of College English Teaching to Art Students.Heilongjiang Education, 6, 91-92.Zhang, Y. J., Wang, L. J., Shen, T. T., & Liu, C. X. (2008).Survey and Discussion on ESP Teaching and Learning of the Fashion Design Major.Journal of Zhejiang Sci-Tech University, 25 (6), 771-775.

Table 1 :
The coverage of three base word lists in ARAC Word list One is the first 1000 frequent GSL word list, List Two is the second 1000 frequent GSL word list, and List Three is the AWL.Not in the list means the words were non-GSL and non-AWL. Note:

Table 2 :
Comparison of top 10 GSL words in the ARAC and in the GSL

Table 3 :
Comparison of top 10 academic words in the ARAC and in the AWL