Functional Writing in the Primary Years: Protocol for a Mixed-Methods Writing Intervention Study1

This protocol article describes the project Functional Writing in the Primary Years, which received funding in late 2018 and was started in August 2019. The Functional Writing in Primary School (FUS) project aims to increase the quality of teaching and learning writing in the first years of schooling. A large-scale, mixed-methods study, the FUS project investigates the effects of an early start with functional writing, focusing on young students’ development as writers and their ability to use writing as a tool for learning and communication. The project also investigates teachers’ writing instruction and professional development. The protocol describes the project’s rationale and major methodological aspects and culminates in a concluding discussion about possible caveats.


203
learners benefit most from instruction that combines learning alphabetic codes with authentic writing for a variety of purposes in ways that are meaningful for the students (Hall, 2013). In other words, the available research provides strong support for writing instruction that builds on the notion of functional writing. However, Norwegian and international investigations have indicated that writing is almost nonexistent in preschool (Gerde et al., 2012) and only receives a small amount of time and development in the primary grades (Connor et al., 2013;Håland et al., 2019;Morrow et al., 2011;Roth & Guinee, 2011).
Although writing has been defined as a key competence in the national Norwegian curriculum for one and a half decades, there have been few large-scale Norwegian studies exploring writing instruction and assessing the effect of implementing writing instruction programs. The tendency, which seems present in the other Nordic countries as well, seems to have been for researchers to engage in ethnographically inspired explorations of writing instruction with small samples or to conduct quantitative studies using larger samples but focusing on technical aspects (Hagtvet et al., 2019;Skar & Tengberg, 2014). Consequently, there is a need for knowledge concerning how teachers learn and use tools to promote functional writing. There is also a need for knowledge concerning classroom practices where functional writing is promoted and, lastly, for knowledge of whether such a focus can be deemed more effective than the established practices (or, as it were, business as usual). Such knowledge is important for preparing schools and teaching students, as stakeholders' hold high demands regarding the quality of learning processes. Unfortunately, there is currently a lack of knowledge concerning these areas. To meet these challenges and to provide the sector and teacher education with relevant knowledge, the FUS project's primary objective is to investigate the consequences of an early start with functional writing on young students' writing proficiency (see Skar et al., 2020 for construct definitions), teachers' professional learning, and learning activities.
The lack of research and the need for knowledge about functional writing in the early grades call for investigations that can produce evidence-based knowledge on the effects of focusing on functional writing using a set of classroom activities-including play and formal instruction-and on formative writing assessments (Kvithyld & Aasen, 2011). In turn, this requires intervening in schools using a randomized control trial design to introduce teachers to tools for teaching functional writing from the start of year 1. Thus, the FUS project's first secondary objective is to investigate the effects that a writing instruction intervention, focusing on functional writing, has on students' writing proficiency.
Based on the previous research, we anticipate that this intervention will promote writing proficiency. Still, it is also anticipated that children will follow different trajectories in their development as writers. The FUS project will examine the diversity in children's paths to writing proficiency through observations of writing activities, responses to writing instruction, and analyses of written texts. Consequently, the second secondary objective will be to describe and explain students' development as writers.
Generally, there is a current focus on teachers' lifelong learning (cf. Parr et al., 2007). To gain in-depth insight into the mechanisms behind the results of the intervention, it is necessary to investigate how and what teachers learn during the project. While the project can be characterized as a professional development (PD) project, given that it entails "purposeful, to some extent face-to-face, formalized and organized learning and/or training opportunity for in-service teachers" (Kalinowski et al., 2019, p. 2), it is of great interest to also investigate teachers' professional development. Such investigations will enable analyses of the relationships between the systematic and unsystematic variations in teacher development and the results of the intervention. A similar proposition may be put forward regarding instructional practices. Any local adaptation of resources will entail a variation in instructional practices, which is relevant to the interpretation and use of the intervention's results. In sum, knowledge about professional development processes and altered instructional practices are required to facilitate an in-depth understanding of the mechanisms behind the results of the intervention. Thus, the third secondary objective is to describe and explain teachers' professional development and writing instruction practices during the project.

The Intervention
The intervention consists of a professional development program and peer activity sessions for the participating teachers, following steps that will be repeated throughout the project (Desimone & Garet, 2015;Desimone & Pak, 2017). There will be ten "project weeks" for each semester of the program, which will run for two years (i.e., 40 project weeks). The program consists of three types of activities: • Instructional activities (IA) (including assessment): teachers performing instructional activities in the classroom. The IAs are designed as instructions directed to the teacher, who needs to interpret them and operationalize them. The amount of specification varies, but teachers are given detailed instructions about major steps (e.g. what to do before, during, and after students write) and the IAs are detailed on the Network Sessions (see below). An example of an IA is "Make Your Own Superhero". Teachers are instructed to let students use material to create a physical representation of a superhero (SH) and to write a back story about the SH, before both the SH and the backstory are displayed at a vernissage for parents. The IAs contain descriptions of each of the steps, but the teacher is encouraged to make adaptations as they see fit with regards to the needs of the student group. The instructional activities are modeled on a prior project conducted by the Norwegian Centre for Writing Education and Writing Research. They include several aspects of what has proven to be good writing instruction practice, for instance: writing frequently, creating a supportive classroom, exploring the purpose of writing, and modeling and teaching strategies for achieving that purpose as well as teaching more foundational skills (Graham Functional & Harris, , 2018. The program also includes a component in which students learn letters at a "fast pace" (>1 letter per week), which has proven to be effective (Sunde et al., 2019). • Teachers' learning activities (TLA): teachers engaging in activities aimed at teacher development/learning. TLAs include but are not limited to reading material, listening to podcasts, writing reflective notes, etc. Teachers will also use a newly developed assessment tool that highlights eight aspects of writing quality , which is presumed to broaden their understandings and give nuance to the text quality in texts written by young children. • Network sessions (NS): meetings with facilitators from the project or other project schools to share experiences, etc. An important feature of the NS will be the opportunity to practice new teaching material and get feedback. In particular, feedback has proven to be vital in teachers' professional development (Allen et al., 2011;Kalinowski et al., 2019).
It is anticipated that this program promoting the functional aspects of writing will break with business as usual in two ways. First, we anticipate that teachers, on average, will spend more time on writing instruction in the intervention group than in the control group. Studies of Norwegian teachers' writing instruction practices indicate that as much as 19 % of the teachers do not teach writing at all in the first semester of the first grade (Håland et al., 2019), and that on average, teachers engage in writing instruction for 20 minutes per day in grades 1-3 (Graham et al., 2020). Second, we anticipate that the FUS program will be a contrast to the time-honored activities of dictation and other activities focusing exclusively on the mechanical aspects of writing, which teachers in grade 1 report engaging in frequently (Graham et al., 2020). Practically, the program is distributed in electronic booklets and is available in phone, tablet, and desktop formats. Teachers at each school will conduct the learning activities in their classes and engage in peer observations of one another. To increase the fidelity, a member of the research team will also visit all teachers once per year, and there is a project "hotline" and an e-forum for teachers to use in case of questions, comments or discussion. All participating teachers are surveyed each week regarding completion of the current week's activities.

Research Questions
The objectives stated above will be met by implementing the intervention and by answering the following research questions: 1) What are the effects of participating in the project on students' writing proficiency? 2a) What characterizes students' writing competencies and writing development throughout the project? 2b) What characterizes students' development as writers throughout the project? 3a) What characterizes teachers' professional development throughout the project? 3b) What characterizes the instructional practices during the project?

Methodology
Study Design: Intervention through RCT The intervention started the first day of the first grade (autumn 2019) and will run for two academic years. Two groups are participating in the project: one treatment group and one control group. After the intervention program is completed in the intervention group, the control group will take part in the treatment. To test the sustainability of a potential intervention effect, students in the intervention group will participate in data collection at the end of school year three. Because the control group will have been exposed to the treatment by then, a sustainability baseline test of students' writing proficiency was conducted prior to the intervention start. This test was conducted by students in the first to third grades at participating schools. Given the nature of that data, the baseline sustainability test will also be usable as reference data, since it will allow the project to estimate the expected writing proficiency of students unaffected by participation either in the intervention or in the control group. The timeline and various steps are also depicted in Figure 1. The intervention is running as a randomized controlled trial (RCT) study (Pontoppidan et al., 2018) including 58 schools. Originally, 60 schools opted for participation, but two left the project a few weeks after the startup. The RCT takes place in three large and small regions in Norway (of which two are municipalities). These three regions provide enough variation (e.g., socioeconomic variation, rural/urban schools) for the results to be generalizable to Norway. The number of schools in these regions is higher than the sample required to conduct the RCT. Therefore, the school owners, in collaboration with the project team, decided which schools were to be recruited.

207
The design resulted in an implementation randomized at the school level within each region. Further, the two municipalities were divided into four groups: one with large schools and one with small schools in each municipality. The stratification resulted in five blocks: region 1 with 2 clusters, region 2 with 1 cluster, and region 3 with 2 clusters. Consequently, each school within each cluster was randomized into either the intervention or the control group, and each region consists of an equal number of intervention and control schools.
Because the project has already started, we choose to present data that reflect the schools currently participating in the project, rather than the data available at the time of the randomization. According to publicly available data, the average group size within each cluster was the following for the academic years 2016-2018: Cluster 1 had an average of 49.4 students (SD = 11.1), Cluster 2 had 83.9 students (SD = 14.8), Cluster 3 had 43.3 students (SD = 12.9), Cluster 4 had 34.2 students (SD = 10.4), and Cluster 5 had 69 students (SD = 12.7). As can be seen in Table 1, the average school size after the randomization into control schools and interventions schools, respectively, was quite similar within the clusters.
The stratification method was used at the expense of another method basing the strata on the scores on the national reading test for reading in the 5 th grade (which is the only publicly available data on school performance). Using any one criterion (size or national test score) risked producing strata that were either very different in size or in scores. As it turned out, the differences in the average national reading test scores were not very dissimilar between the intervention and control groups within each cluster. An exception to this is for cluster 2, in which the intervention schools, on average, outperformed the control schools on the test. The overall and nonsignificant (U = 392.5, p = .84) difference was 0.3 points. The details are presented in Table 1. A preliminary estimate using the PowerUp software (Dong & Maynard, 2013) indicates that we will have enough power to get at least a minimum detectable effect (MDE) of 25.1% of the standard deviation. The computation was based on the following values: a harmonic mean of 16 students per class, a harmonic mean of 2.97 classes per school, and a harmonic mean of 11.29 schools per cluster, with a proportion of variance of .15 among Level 2 and Level 3 units, respectively. Further, the proportion of Level 3 units randomized to treatment was set to 55%. A previous meta-analysis of writing programs for elementary school (Graham et al., 2012) found that the treatment effect of previous writing interventions normally exceeds our estimated MDE, suggesting that our sample size is large enough to detect effects of relevant magnitudes.

Participants: recruitment and sample size
The respective regions' school owners are official and paid collaborators in the project and have been important assets in the participating schools' recruitment. To recruit schools, the project team met with all the headmasters in the three regions to inform them about the project. The headmasters later reported their interest to school owners or were contacted by school owners with an inquiry to participate. The sampling procedure was thus characterized by self-selection. Sixty schools agreed to participate but, as noted, two decided to drop out due to major staff changes. Several groups of participants are involved in the project. The subchapter describes the three main groups: students participating in the sustainability baseline test and students and teachers participating in the project. Other groups consisting of subsamples of students and teachers participating in studies and answering research questions 2-3 are described later in the article.

Participants in the sustainability baseline test
The sustainability test (see chapter 5.1) was administrated by teachers at the participating schools. Of the 6,604 students who participated in the sustainability baseline test, 2,305 were from the first grade, 2,139 were from the second grade, and 2,160 were from the third grade. The exact number of students included in the study, however, may change since there may be blank responses.

Participants in the project
Based on public data on group size accessed through Grunnskolens informasjonssystem [The Information System for Compulsory School; www.gsi.no], the average group size in the participating schools has been M = 56.8 (S.D. = 21.8) students in the last three years, equaling an estimated sample of 3,296 students. Given an average of three groups per 55 students, the estimated number of participating teachers will be approximately 175.

Ethics
The proposed project will be carried out according to the detailed ethical standards of the British Association of Applied Linguistics (BAAL, 2006) and the guidelines Functional Writing in the Primary Years 209 outlined by the Norwegian National Research Ethics Committees (2016). The project has been approved by the Norwegian Centre for Research Data (NSD). All student participation will be based on written, informed consent from relevant parents or guardians, and all available data will be digitized only after anonymization and the removal of indexical information that may lead to participant identification. Parents or guardians of students that will be observed by video recordings will sign a specific consent form detailing the video observations.

Premeasures and Outcome Measures
Answering RQ1 One of the main evaluations of the project will be to gauge the effect of the intervention on students' writing proficiency. All students participating in the main project will take part in four writing tests, each containing at least three tasks: • one copy task measuring students' ability to copy text under time constraints; • two writing tasks prompting students to write about a certain topic (e.g., favorite subject in school).
The first test (i.e. with the three tasks described above) will serve as the baseline test, and test 2-4 will be used to gauge the effect at different time points (please refer to Figure 1). 3 All tests will be administrated by the students' teachers, a decision based on two factors. First the project lacks the resources to concurrently administrate writing tests across all sites. Second, and at least as important, the students' own teacher will necessarily offer more psychological safety than external test administrators. The project has, however, recorded video instructions that details the procedures for the test for the teacher to play back to the students. The teachers have been instructed to show the video instructions or to use them as models for their own administration. The copy task yields information about students' "writing fluency," as it were, and the hypothesis, based on previous studies, is that fluency is positively associated with writing quality, presumably because greater fluency will free cognitive capacity for processing and producing content (Graham et al., 1997). The writing tasks will yield information about text quality, more specifically about text quality seen through the lens of eight assessment scales capturing quality aspects such as audience awareness, organization, vocabulary, spelling, and letter formation. It is anticipated that the hypothesized increased focus on writing and the activities in the FUS project will lead to higher fluency as well as greater text quality for students in the intervention schools. To get further insights into whether time spent on writing instruction increases as a result of participation in the FUS project, teachers in both conditions are surveyed digitally once a week with questions regarding how much time they spend on teaching transcription skills and on discursive skills, respectively.
To measure text quality, a new assessment tool , which in a validating phase produced reliability well over .85, was developed. All texts will be rated on eight rating scales (Audience Awareness, Vocabulary, Organization of Content, Language Use, Punctuation, Spelling, Handwriting, and Relevance), and all assessments will be processed using contemporary statistical techniques in writing assessment (i.e., many-facet Rasch measurement; Skar, 2017). The texts will be rated by a panel of trained raters led by the project's principal investigator. All texts will be masked, so that the raters are unaware of the condition assignment.
To gauge the project's effect, multilevel modeling (Snijders & Bosker, 2012) will be used. Granted that all students participate in all writing tests, each test will yield some 6,400 texts and 3,200 copy tasks, in total 25,600 texts and 12,800 copy tasks.

Answering RQ2a, RQ2b
To answer RQ2a, the FUS project will reuse the texts collected through the writing tests. The research group will analyze a representative sample of texts from both conditions to investigate what characterizes writing development in the project vs. the control schools. The team will develop an analytical model based on established text-linguistic, linguistic, and discourse analytic methods. Depending on recourses between 10-20 % of the texts will be analyzed. The texts will be sampled to represent all ability levels, as measured in the writing tests.
To answer RQ2b, the project will build on a long tradition of ethnographic literacy research (e.g., Dyson, 1989Dyson, , 1993Fast, 2007;Heath, Street, & Mills, 2008) by observing and interviewing a relatively small sample of students: two or three in half the intervention schools (i.e., n = 30-45), two periods per semester. The observations will focus on student actions (e.g., what they write, how they write, with whom they write, and if they talk about writing), while the interviews, using the observations and texts as stimulus material, will focus on the learners' reflections on their writing actions. The research group will develop an observation protocol for this work package. The researchers will, for example, ask the students to describe and reflect on observed actions in their own words. With regards to the structure of the interviews, the researchers will pilot both individual and focus group modes and choose the option that provides most information while also preserving psychological safety.
Answering RQ3a, RQ3b Teachers' professional development will be investigated using a range of data collection methods to ensure triangulation and nuanced understandings. All teachers in the project (including those in the control group) will participate in a teacher survey also carried out on a national representative sample of teachers in grades 1-3 in Norway (Graham et al., 2020). The survey contains items concerning the type of writing taught Functional Writing in the Primary Years 211 and the kinds of classroom activities used, as well as items about teachers' beliefs and self-efficacy. The survey will fulfill multiple purposes: first, it will enable the comparison of teachers in the project schools (both intervention and control) to a national representative sample of teachers at the start of the project; second, it will enable the comparison of teachers in the intervention and control groups at the start of the project; and, third, it will enable the tracking of any changes within and between the intervention and control groups at the ends of Stages 1 and 2 (cf. Figure 1).
Teachers' professional development will also be investigated using observations and interviews. Depending on the local conditions, such as the number of classes, one or two teachers at each participating school (i.e., n = 29-58) will be observed and interviewed on at least one occasion per semester (i.e., k observations and interviews = 120-240). Observations will be made using previously developed observation protocols for writing instruction that fit the activities in the FUS program (Henk et al., 2003). This protocol will let the researcher observe the mode of instruction (e.g., lectures, play, workshops, writing), instructional focus (e.g., formal or functional aspects of writing or content aspects), and teacher-student interaction. Researchers will observe writing instruction by being present, and also video record lessons. Furthermore, a subgroup of teachers (n = 5) will be followed more closely (i.e. observed more frequently).
The project will also collect a teaching diary focusing on classroom activities, using categories from the observation protocol. All participating teachers (i.e., both in the intervention and control schools) will fill out the teaching diary, which will then be used to analyze which activities are occurring in both conditions and which are strongly associated with either one of the conditions. The teachers will be asked to focus on the most recent three weeks of instruction. The diary will be collected via the Internet one time per semester.

Personnel as of March 2020 (alphabetically listed)
Principal investigator Gustaf B. Skar, Ph.D., Professor, Department of Teacher Education, Norwegian University of Science and Technology

Some Challenges When Conducting Writing Interventions in Schools
Changing classroom practices is the key to raising students' writing proficiency, and it requires teachers to engage students in new classroom activities (e.g., formal instruction, games, role-plays, etc.) and to use assessment tools for formative purposes. Previous writing instruction intervention studies have revealed several highly effective components. Seminal reviews by Graham and Perin (2007), Graham et al. (2012), and Hillocks (1986) have shown that certain strategies and feedback practices can be highly effective, with effect sizes ranging from d = 1.17 (strategy instruction; Graham et al., 2012) to d = 0.80 (feedback from adults; Graham et al., 2012). However, some of these effect studies can be characterized by a small sample of students and a short duration and did not always include teachers. In a worst-case scenario, this can be problematic. First, research has shown that the sustainable, positive development of writing proficiency is associated with long-term intervention and with having teachers as the primary facilitators (Hall, 2013). Second, there is always a risk of low ecological validity when implementing a program under experiment-like conditions. There are also several examples of writing interventions and professional learning programs that involve teachers and have long durations, like the FUS project. One such example is the NORM project, which included a two-year intervention and reported significant effect sizes (Berge et al., 2019). The project shared features with what has been documented to be the best practices within professional learning: content focus, active participation, coherence, duration, and collective participation (Desimone, 2009;Desimone & Garet, 2015). The NORM project evaluated the intervention through measures of student writing but did not, however, include a strong methodology for evaluating teachers' professional learning or the fidelity of the intervention. The latter is also important when designing programs such as this (cf. Desimone, 2002), as such problems can increase the risk of low internal validity.
Norwegian schools are diverse, and there is a long tradition of placing a vast amount of trust in teachers' and schools' capabilities for using steering documents to develop meaningful and beneficial instruction. The external validity of any intervention program-including the FUS project-in the Norwegian context is contingent upon letting teachers interpret and use externally developed tools. We anticipate that even with facilitation from researchers, some differences in how the resources are interpreted and implemented are inevitable; these differences are a necessary part of teachers adapting resources for local needs. Other differences may stem from idiosyncratic use. Studies of students' writing proficiency indicate a large variation between classes and between schools (e.g., Skar, 2017). This variation will be accounted for by including schools from different contexts to increase the generalizability of the results.

Author biography
Gustaf Bernhard Uno Skar holds a Ph.D. in educational science from Stockholm university. He is currently professor at the Department of Teacher Education, and part-time professor at the National Centre for Writing Education and Research.
Arne Johannes Aasen holds a Ph.D. in literature and is director of the Norwegian Centre for Writing Education and Research (The Writing Centre) he is also co-investigator in the FUS project: Functional Writing in Primary School. (1976) is an associate professor at the Department of Teacher Education, NTNU. He has a Master's degree in literature from UiT and a PhD in applied linguistics from NTNU.