SCALES FOR THE MEASUREMENT OF ETHOS

James C. McCroskey

The Pennsylvania State University

Since the days of Corax and Tisias rhetorical theorists have been concerned with the role of ethos in communication.¹ In recent years ethos, sometimes referred to as credibility or prestige, has been a frequent variable for study or control in experimental research in speech, psychology, sociology, and education.² In many of the early studies differences in ethos levels were assumed. In more recent studies the ethos level has usually been measured. The methods of measuring ethos levels have included rankings, sociograms, "prestige indexes," linear rating scales, Thurstone-type attitude scales, and devices similar to Likert scaling techniques, including the semantic differential.³

Construction and scoring of attitude scales like those just mentioned is time consuming, but by using Likert scales and the resources of modern computers much time can be saved. The usual five-choice, strongly-agree to strongly-disagree Likert scale lends itself to machine scoring. Subjects may indicate their responses to scale items on standard IBM answer sheets. The answer sheets can be run through a Digitex scoring machine which will punch the subject's responses on IBM cards so a computer may be used for actual scoring.

The following reports the development of Likert scales to measure ethos which can be scored in the above manner.

PROCEDURE

The literature of speech and psychology was surveyed to locate terms used in reference to ethos, credibility, and prestige. Thirty terms most frequently used to describe this construct became the items composing the original scale for measuring ethos. Introductions for two hypothetical speakers were developed: one for a presumably high-ethos source, the other for a presumably low-ethos source. Each introduction was read to fifty subjects.⁴ A tape recorder was present, and the subjects were led to believe that they were to hear a tape-recorded speech by the person introduced. Immediately after the introduction the subjects competed the ethos scale. The results were scored, correlated, and factor analyzed.

Factor analysis produced two significant factors. The first, which can be described as an "authoritativeness" factor, accounted for 47% of the variance. The second factor, which can be described as a "character" factor, accounted for 29% of the variance. While this finding of two-factoredness is consistent with findings of most other researchers,⁵ it should be noted that the theoretical "factor" of ethos characterized as "good will" by Aristotle and others and as "intention" toward the listener by Hovland, Janis, and Kelley did not appear. At least two of the items on the scale (see items 6 and 14 in Table II) would appear to measure a part of this theoretical factor. Since these items were loaded heavily on the character factor, one might speculate that the theoretical "good will" or "intention" factor is not separate from authoritativeness and character.

A factor analytic study reported by Berlo and Lemert using semantic differential scales identified three factors for the ethos construct.⁶ These factors were "competence," "trustworthiness," and "dynamism." The first two correspond with the factors designated above as "authoritativeness" and "character." While the "dynamism" factor did not appear in the studies reported below, this should not be interpreted as an indication that it does not exist. An examination of the scale items used by this writer indicates that there were no items which appear to be directed toward this factor.

TABLE I

Authoritativeness Scale

Instructions: Please indicate your response to the following items on the IBM answer sheet provided. Interpret the possible responses as follows: A--Strongly Agree, B--Agree, C--Undecided, D--Disagree, E--Strongly Disagree.
1.	I respect this speaker's opinion on the topic. t = 11.870 r = .801
2.	This speaker is not of very high intelligence. t = 11.584 r = .737
3.	This speaker is a reliable source of information on the topic. t = 13.848 r = .899
4.	I have confidence in this speaker. t = 12.135 r = .833
5.	This speaker lacks information on the subject. t = 9.152 r = .782
6.	This speaker has high status in our society. t = 10.535 r = .701
7.	I would consider this speaker to be an expert on the topic. t = 12.112 r = .869
8.	This speaker's opinion on the topic is of little value. t = 6.799 r = .775
9.	I believe that this speaker is quite intelligent. t = 10.728 r = .843
10.	The speaker is an unreliable source of information on the topic. t = 10.970 r = .801
11.	I have little confidence in this speaker. t = 16.067 r = .862
12.	The speaker is well-informed on this subject. t = 8.095 r = .846
13.	The speaker has low status in our society. t = 11.123 r = .688
14.	I would not consider this speaker to be an expert on this topic. t = 11.457 r = .857
15.	This speaker is an authority on the topic. t = 11.638 r = .867
16.	This speaker has had very little experience with this subject. t = 7.849 r = .802
17.	This speaker has considerable knowledge of the factors involved with this subject. t = 10.582 r = .861
18.	Few people are as qualified to speak on this topic as this speaker. t = 5.566 r = .717
19.	This speaker is not an authority on the topic. t = 10.616 r = .862
20.	This speaker has very little knowledge of the factors involved with the subject. t = 11.471 r = .833
21.	This speaker has had substantial experience with this subject. t = 9.683 r = .851
22.	Many people are much more qualified to speak on this topic than this speaker. t = 8.500 r = .797

The significance of the "dynamism" factor in persuasive communication is yet to be established. Although two studies have been reported which used the Berlo and Lemert scales, neither investigated the effect of the various ethos factors on other variables in communication.⁷ There is reason to believe that this factor is not a significant element in ethos for persuasive communication. If we agree that ethos is the "attitude toward a speaker held by a listener," we would expect the factors of that attitude to be consistent with the factors of other attitudes. In the extensive research reported by Osgood and others, the "evaluative" dimension was found to be representative of attitude. This led

TABLE II

Character Scale

Instructions: Please indicate your response to the following items on the IBM answer sheet provided. Interpret the possible responses as follows: A--strongly Agree, B--Agree, C--Undecided, D--Disagree, E--Strongly Disagree.
1.	I deplore this speaker's background. t = 11.969 r = .748
2.	This speaker is basically honest. t = 6.811 r = .734
3.	I would consider it desirable to be like this speaker. t = 9.949 r = .770
4.	I his speaker is not an honorable person. t = 7.769 r = .768
5.	This speaker is a reputable person. t = 8.845 r = .698
6.	This speaker is not concerned with my well-being. t = 6.389 r = .733
7.	I trust this speaker to tell the truth about the topic. t = 11.082 r = .803
8.	This speaker is a scoundrel. t = 6.703 r = .807
9.	I would prefer to have nothing at all to do with this speaker. t = 4.621 r = .701
10.	Under most circumstances I would be likely to believe what this speaker says about the topic. t = 9.525 r = .780
11.	I admire the speaker's background.. t = 12.765 r = .790
12.	This speaker is basically dishonest. t = 4.012 r = .754
13.	The reputation of this speaker is low. t = 6.297 r = .724
14.	I believe that this speaker is concerned with my well-being. t = 7.896 r = .804
15.	This speaker is an honorable person. t = 5.762 r = .707
16.	I would not prefer to be like this speaker. t = 8.777 r = .792
17.	I do not trust the speaker to tell the truth on this topic. t = 8.665 r = .828
18.	Under most circumstances I would not be likely to believe what this speaker says about the topic. t = 12.460 r = .845
19.	I would like to have this speaker as a personal friend. t = 8.313 r = .754
20.	The character of this speaker is good. t = 7.513 r = .797

to a definition of attitude in terms of this evaluative dimension. The principle of congruity, based on this definition, has since been demonstrated as a reliable means of predicting attitude change toward speakers and concepts in persuasive communication.⁸ If "dynamism" were a significant factor in persuasive communication it should have confounded the congruity studies. Apparently it did not.

TABLE III

Semantic Differential Scales

Authoritativeness			Character
	1.	Reliable-Unreliable		1.	Honest-Dishonest
	2.	Informed-Uninformed		2.	Friendly-Unfriendly
	3.	Qualified-Unqualified		3.	Pleasant-Unpleasant
	4.	Intelligent-Unintelligent		4.	Unselfish -Selfish
	5.	Valuable-Worthless		5.	Nice-awful
	6.	Expert -Inexpert		6.	Virtuous-Sinful

Therefore, until research is reported indicating the significance of the "dynamism" factor in persuasive communication, there is justification for assuming that the significance of ethos in persuasive communication lies in the "evaluative" dimension.

The question therefore arises as to why two factors were found to exist in the above study. To determine whether this finding was an artifact of the Likert scaling approach, further study utilizing forty evaluative semantic differential items was conducted. Using essentially the same procedure as described above for the Likert scales, factor analysis again produced two significant factors. The "authoritativeness" factor accounted for 52% of the variance and the "character" factor accounted for 19% of the variance. The items with high and pure loadings on the two factors were similar in nature to those reported by Berlo and Lemert.⁹ However, two of their items were excluded from the items selected by this writer because more than 50% of the subjects checked the neutral point on these items.

It would appear from these investigations that the "evaluative" factor when applied to speakers breaks into two factors which we may label "authoritativeness" and "character." Summing across these two factors to arrive at a total "ethos" score, as would be possible if the evaluative factor held together, should be avoided until such time as further research indicates the feasibility of this procedure.

It should be noted that the items with high and pure loadings on the "character" factor include items which appear to measure the theoretical "intention" or "good will" factor. (See items 2 and 4 in Table III.) This further suggests that this theoretical factor is not distinct from the other two observed factors of ethos.

Operating on the assumption that "authoritativeness" and "character" are the constituent parts of the ethos construct, the writer developed separate Likert and semantic differential scales to measure each of these factors. Fourteen new items were added to the original thirty Likert items so that each Likert scale would include twenty-two items. The six semantic differential items with the highest and purest loadings on each factor were selected to constitute the semantic differential scales. To obtain estimates of item discrimination, reliability, and validity the Likert scales were used in seven experiments. To obtain estimates of concurrent validity and reliability of the semantic differential scales, these were included in the final experiment.

Experiment 1

Introductions of three speakers had been developed prior to the construction of the scales. These presumable represented a high, a middle, and a low ethos source. One hundred forty-three subjects listened to one of the introductions and immediately completed both Likert scales. In addition they completed a revised version of the Andersen Authoritativeness Scale.¹⁰

Experiment 2

Forty-three subjects were instructed to "identify in your mind the speaker whom you would be most likely to believe other things being equal." Forty-three other subjects were instructed similarly, except that they were to imagine the speaker they would be "least likely" to believe. All subjects completed the same scales as in Experiment 1.

Experiment 3

Three new introductions were developed. As in Experiment 1, these presumably represented a high, a middle, and a low ethos source. One hundred eleven subjects listed to one of the introductions and immediately completed the two Likert scales.

Experiment 4

Two versions of a speech advocating abolition of capital punishment and two versions of a speech advocating federal control of education were developed. One form of each speech made extensive use of documented evidence. The other contained no documentation or qualification. The speeches were presented to two hundred forty-three subjects, each subject hearing one speech. An additional fifty subjects heard one speech on each topic. The speeches were tape recorded and presented with no information as to the source of the communication. Immediately after hearing the speech the subjects completed both Likert scales. Differences on the authoritativeness scale between the two versions of each speech were predicted.

Experiment 511

Introductions for three sources were developed: a labor leader, a management leader, and an economics professor. Two opinion statements were developed: one pro-labor, one anti-labor. One hundred thirty-three subjects read various combinations of these introductions and opinion statements and immediately completed both Likert scales. Differences on both scales were predicted between sources and opinion statements.

Experiment 6

The capital punishment speeches used in Experiment 4 were presented to one hundred twenty-five high school students participating in the Summer High School Speech Institute at The Pennsylvania State University, each student hearing one speech. Immediately after hearing the speech the subjects completed both Likert scales. A difference on the authoritativeness scale between the two versions of the speech was predicted.

Experiment 7

Introductions of eight speakers, which presumably represented varying ethos levels, were presented to two hundred eighteen subjects. After hearing an introduction each subject completed both Likert scales and both semantic differential scales.

RESULTS

Experiment 1

The three speakers were rated high, middle, and low as expected. The differences between speakers were significant at the .001 level for all three forms. Item discrimination was checked by two methods, item-total correlations and "t" tests. It was decided that item-total correlations should be a minimum of .5. All items on the authoritativeness scale met this criterion. All but two of the items on the character scale met it. The "t" tests were run for all items between the high and low sources. The .001 level was set for acceptance of an item. A "t" of 3.646 was needed for significance at the .001 level. All items on the authoritativeness scale met this criterion. All but two of the items on the character scale met it, the same two that failed to meet the item-total correlation criterion. Tables I and II report the accepted items composing the two scales, the item-total correlations, and "t's" for each item.

Factor analysis indicated only one significant interpretable factor on each scale. A second factor accounting for 5% of the variance on the character scale was uninterpretable. This factor correlated highly with factor one (.833). The split-halves reliability estimate for the authoritativeness scale was .978. The Hoyt Internal Consistency Reliability estimate¹² was .975. The correlation with the Andersen scale was .917.

The split-halves reliability estimate for the character scale was .966. The Hoyt estimate was .961. The correlation with the Andersen scale was .365. The correlation between the authoritativeness and character scales was .521.

Experiment 2

The hypothetical high source was found to be very significantly higher (.0005 level) than the hypothetical low source on all three scales. All items discriminated well beyond the .001 level except for the two items which were found not to discriminate in Experiment 1. Item-total correlations were similarly high.

Experiment 3

The three speakers were rated high, middle, and low as expected. The differences between speakers were significant at the .001 level for both forms. The two items on the character scale which were found not to discriminate were omitted in this experiment. All remaining items met the criteria set in Experiment 1. The split-halves reliability estimate for the authoritativeness scale was .962. The Hoyt estimate was .968. The split-halves estimate for the character scale was .945. The Hoyt estimate was .939. The correlation between the two scales was .534.

Experiment 4

Predicted differences between the evidence and no-evidence speeches were confirmed for the authoritativeness dimension for both topics. No differences were expected in the character dimension and none was found. Factor analysis again indicated only one significant interpretable factor on each scale.¹³ With N = 343, the split-halves reliability estimate for the authoritativeness scale was .951. The Hoyt estimate was .943. The split-halves estimate for the character scale was .940. The Hoyt estimate was .928. The correlation between the scale was .323. The mean inter-item correlation across the two scales was .369.

Experiment 5

All hypotheses in this study were confirmed on the basis of the differences measured by these two scales. The split-halves reliability estimate for the authoritativeness scale was .957. The Hoyt estimate was .953. The split-halves estimate for the character scale was .940. The Hoyt estimate was .936. The correlation between the scales was .690. The mean inter-item correlation was .391.

Experiment 6

The predicted difference between speeches on the authoritativeness dimension was not confirmed. No difference was predicted on the character dimension and none was found. An explanation for the lack of difference on the authoritativeness dimension could be that the experimenter's prestige confounded the results. In Experiment 4 the experimenter was unknown to the subjects. In this experiment the experimenters were the institute director and one of the teachers in the institute. The ratings on both dimensions of ethos for this experiment were significantly higher than those in Experiment 4 (from .01 to .0005 for the various combinations).

The split-halves reliability estimate for the authoritativeness scale was .944. The Hoyt estimate was .946. The split-halves estimate for the character scale was .930. The Hoyt estimate was .932. The correlation between the scales was .708. The mean inter-item correlation was .213.

Experiment 7

The eight speakers were rated as expected on the two Likert scales and on the two semantic differential scales. The split-halves reliability estimate for the Likert authoritativeness scale was .969. The Hoyt estimate was .964. The split-halves estimate for the Likert character scale was .979. The Hoyt estimate was .968. The Hoyt reliability estimates for the authoritativeness and character semantic differential scales were .933 and .922 respectively.

The correlation between the Likert and semantic differential authoritativeness scales was .851. The correlation for the two character scales was .817. Factor analysis indicated only one significant factor on each of the four scales. The amount of variance accounted for by the significant factor on each scale was as follows: Likert authoritativeness, 62%; Likert character, 63%; semantic differential authoritativeness, 70%; semantic differential character, 65%.

CONCLUSIONS

On the basis of the above experiments it can be concluded that these scales are capable of reliably measuring either initial or terminal ethos on the two dimensions of authoritativeness and character. Whether they can be used to measure change from initial to terminal ethos remains to be tested.

The results of Experiments 4 and 5 are of particular importance. Because the scales were originally developed as measures of initial ethos created by introductions, the possibility of accidentally biasing the results by constructing the introductions so as to manipulate the precise factors the scales were designed to measure was present. In Experiment 4, however, this bias could not have influenced the results because the speaker was not introduced at all. Experiment 5 is important because it was conducted by another researcher who was not familiar with the scales or their development. This study was designed to test other hypotheses and the validation of the ethos scales was of secondary importance. Thus, it is most unlikely that accidental bias, which could have been present in Experiments 1, 3, and 7, was present in Experiments 4 and 5.

There are four relevant indications of the validity for the Likert scales. First, the content of the items and the procedure used in their selection tend to indicate that they are representative samplings of the universe of items pertaining to the construct of ethos. Second, the authoritativeness scale correlates highly with the Andersen authoritativeness scale. The correlation between the character scale and the Andersen scale is relatively low. It appears that the character and authoritativeness scales measure primarily different things and the authoritativeness scale measures primarily the same things at the Andersen scale. Third, both scales measured the hypothetical ethos levels projected in Experiment 2. Fourth, all the hypotheses in the other six studies (with the exception of Experiment 6) were confirmed by the scores derived from these two scales. These four indications suggest that these are valid instruments for the measurement of the authoritativeness and character dimensions of ethos.

The results of Experiment 7 indicate the reliability of the two semantic differential scales for measuring authoritativeness and character. The high correlations between the Likert and semantic differential scales are an indication of concurrent validity. Whatever the Likert scales measure, the semantic differential scales appear to measure equally as well. Since there is considerable justification for believing that the Likert scales are valid measures of the authoritativeness and character dimensions of ethos, we can also conclude that the semantic differential scales are valid measures of these dimensions.

Since all four scales are easy to administer, the choice of which scales to use in a given experiment could well depend on the equipment available. If the experimenter has access to a Digitex machine, the use of the Likert scales would eliminate the necessity for hand punching or hand scoring of the obtained data. If, however, a Digitex machine is not available, use of the semantic differential scales would substantially reduce the time required for hand punching or hand scoring. Fortran computer programs for scoring the Likert scales from Digitex punched cards and for scoring the semantic differential scales from hand punched cards will be provided by the writer upon request.

NOTES

1. William M. Sattler, "Conceptions of Ethos in Ancient Rhetoric," Speech Monographs, XIV (1947), 55-65.

2. Kenneth Andersen and Theodore Clevenger, Jr., "A Summary of Experimental Research in Ethos," Speech Monographs, XXX (June, 1963), 59-78.

3. Ibid., pp. 74-77.

4. All subjects used in the research here reported (except Experiment 6) were selected from students enrolled in Speech 200 at The Pennsylvania State University. This is a required course in oral communication. Students from all colleges in the University are enrolled and the classes include students at all undergraduate levels.

5. See, for example, C. I. Hovland, I. L. Janis, and H. H. Kelley, Communication and Persuasion (New Haven, 1953), pp. 19-55.

6. David K. Berlo and James B. Lemert, "A Factor Analytic Study of the Dimensions of Source Credibility." Paper presented at the 1961 convention of the SAA, New York.

7. Eldon E. Baker, "The Immediate Effects of Perceived Speaker Disorganization on Speaker Credibility and Audience Attitude Change in Persuasive Speaking," Western Speech, XXIX (Summer, 1965), 148-161; and Gerald R. Miller and Murray A. Hewgill, "The Effect of Variations in Nonfluency on Audience Ratings of Source Credibility," QJS, L (February, 1964), 33-44.

8. Charles E. Osgood, George J. Suci, and Percy H. Tannenbaum, The Measurement of Meaning (Urbana, 1957). See also, David K. Berlo and Halbert E. Gulley, "Some Determinants of the Effect of Oral Communication in Producing Attitude Change and Learning," Speech Monographs, XXIV (March, 1957), 10-20.

9. Op. cit. The "competence" factor included experienced-inexperienced, expert-ignorant, trained-untrained, and competent-incompetent. The "trustworthiness" factor included just-unjust, kind-cruel, admirable-contemptible, and honest-dishonest. The "dynamism" factor included aggressive-meek, bold-timid, energetic-tired, and extroverted-introverted.

10. This scale was developed to measure ethos of speakers discussing the "farm problem." It was modified by omitting references to the "farm problem" and inserting references to "problems of education." For a discussion of the original scale see Kenneth E. Andersen, "An Experimental Study of the Interaction of Artistic and Non-Artistic Ethos in Persuasion," unpublished dissertation (Wisconsin, 1961).

11. This experiment was conducted by William E. Arnold, Instructor in Speech at The Pennsylvania State University.

12. This reliability estimate is based on analysis of variance. See J. P. Guilford, Psychometric Methods, 2nd ed. (New York, 1954).

13. A second factor appeared on the character scale which accounted for 7% of the variance. This factor was uninterpretable because the content of the items with high loadings was essentially the same as that of items on factor one. The two factors correlated highly (.886). Items 6 and 13 on the authoritativeness scale had significant but not high loadings on the only significant factor.

Click Here To Go Back To PERIODICALS