返回首页 《从零开始学 Python》(第二版)

预备

基本数据类型

语句和文件

函数

错误和异常

模块

保存数据

实战

用Tornado做网站

科学计算

结尾

文件(2)

上一节,对文件有了初步认识。读者要牢记,文件无非也是一种类型的数据。

文件的状态

很多时候,我们需要获取一个文件的有关状态(也称为属性),比如创建日期,访问日期,修改日期,大小,等等。在 os 模块中,有这样一个方法,专门让我们查看文件的这些状态参数的。

>>> import os
>>> file_stat = os.stat("131.txt")      #查看这个文件的状态
>>> file_stat                           #文件状态是这样的。从下面的内容,有不少从英文单词中可以猜测出来。
posix.stat_result(st_mode=33204, st_ino=5772566L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=69L, st_atime=1407897031, st_mtime=1407734600, st_ctime=1407734600)

>>> file_stat.st_ctime                  #这个是文件创建时间
1407734600.0882277                      

这是什么时间?看不懂!别着急,换一种方式。在 Python 中,有一个模块 time,是专门针对时间设计的。

>>> import time                         
>>> time.localtime(file_stat.st_ctime)  #这回看清楚了。
time.struct_time(tm_year=2014, tm_mon=8, tm_mday=11, tm_hour=13, tm_min=23, tm_sec=20, tm_wday=0, tm_yday=223, tm_isdst=0)

read/readline/readlines

上节中,简单演示了如何读取文件内容,但是,在用 dir(file)的时候,会看到三个函数:read/readline/readlines,它们各自有什么特点,为什么要三个?一个不行吗?

在读者向下看下面内容之前,请想一想,如果要回答这个问题,你要用什么方法?注意,我问的是用什么方法能够找到答案,不是问答案内容是什么。因为内容,肯定是在某个地方存放着呢,关键是用什么方法找到。

搜索?是一个不错的方法。

还有一种,就是在交互模式下使用的,你肯定也想到了。

>>> help(file.read)

用这样的方法,可以分别得到三个函数的说明:

read(...)
    read([size]) -> read at most size bytes, returned as a string.

    If the size argument is negative or omitted, read until EOF is reached.
    Notice that when in non-blocking mode, less data than what was requested
    may be returned, even if no size parameter was given.

readline(...)
    readline([size]) -> next line from the file, as a string.

    Retain newline.  A non-negative size argument limits the maximum
    number of bytes to return (an incomplete line may be returned then).
    Return an empty string at EOF.

readlines(...)
    readlines([size]) -> list of strings, each a line from the file.

    Call readline() repeatedly and return a list of the lines so read.
    The optional size argument, if given, is an approximate bound on the
    total number of bytes in the lines returned.

对照一下上面的说明,三个的异同就显现了。

EOF 什么意思?End-of-file。在维基百科中居然有对它的解释:

In computing, End Of File (commonly abbreviated EOF[1]) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream. In general, the EOF is either determined when the reader returns null as seen in Java's BufferedReader,[2] or sometimes people will manually insert an EOF character of their choosing to signal when the file has ended.

明白 EOF 之后,就对比一下:

  • read:如果指定了参数 size,就按照该指定长度从文件中读取内容,否则,就读取全文。被读出来的内容,全部塞到一个字符串里面。这样有好处,就是东西都到内存里面了,随时取用,比较快捷;“成也萧何败萧何”,也是因为这点,如果文件内容太多了,内存会吃不消的。文档中已经提醒注意在“non-blocking”模式下的问题,关于这个问题,不是本节的重点,暂时不讨论。
  • readline:那个可选参数 size 的含义同上。它则是以行为单位返回字符串,也就是每次读一行,依次循环,如果不限定 size,直到最后一个返回的是空字符串,意味着到文件末尾了(EOF)。
  • readlines:size 同上。它返回的是以行为单位的列表,即相当于先执行 readline(),得到每一行,然后把这一行的字符串作为列表中的元素塞到一个列表中,最后将此列表返回。

依次演示操作,即可明了。有这样一个文档,名曰:you.md,其内容和基本格式如下:

You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be.

分别用上述三种函数读取这个文件。

>>> f = open("you.md")
>>> content = f.read()
>>> content
'You Raise Me Up\nWhen I am down and, oh my soul, so weary;\nWhen troubles come and my heart burdened be;\nThen, I am still and wait here in the silence,\nUntil you come and sit awhile with me.\nYou raise me up, so I can stand on mountains;\nYou raise me up, to walk on stormy seas;\nI am strong, when I am on your shoulders;\nYou raise me up: To more than I can be.\n'
>>> f.close()

提示:养成一个好习惯,只要打开文件,不用该文件了,就一定要随手关闭它。如果不关闭它,它还驻留在内存中,后面又没有对它的操作,是不是浪费内存空间了呢?同时也增加了文件安全的风险。

注意:在 Python 中,'\n'表示换行,这也是 UNIX 系统中的规范。但是,在奇葩的 windows 中,用'\r\n'表示换行。Python 在处理这个的时候,会自动将'\r\n'转换为'\n'。

请仔细观察,得到的就是一个大大的字符串,但是这个字符串里面包含着一些符号 \n,因为原文中有换行符。如果用 print 输出这个字符串,就是这样的了,其中的 \n 起作用了。

>>> print content
You Raise Me Up
When I am down and, oh my soul, so weary;
When troubles come and my heart burdened be;
Then, I am still and wait here in the silence,
Until you come and sit awhile with me.
You raise me up, so I can stand on mountains;
You raise me up, to walk on stormy seas;
I am strong, when I am on your shoulders;
You raise me up: To more than I can be.

readline()读取,则是这样的:

>>> f = open("you.md")
>>> f.readline()
'You Raise Me Up\n'
>>> f.readline()
'When I am down and, oh my soul, so weary;\n'
>>> f.readline()
'When troubles come and my heart burdened be;\n'
>>> f.close()

显示出一行一行读取了,每操作一次 f.readline(),就读取一行,并且将指针向下移动一行,如此循环。显然,这种是一种循环,或者说可迭代的。因此,就可以用循环语句来完成对全文的读取。

#!/usr/bin/env Python
# coding=utf-8

f = open("you.md")

while True:
    line = f.readline()
    if not line:         #到 EOF,返回空字符串,则终止循环
        break
    print line ,         #注意后面的逗号,去掉 print 语句后面的 '\n',保留原文件中的换行

f.close()                #别忘记关闭文件

将其和文件"you.md"保存在同一个目录中,我这里命名的文件名是 12701.py,然后在该目录中运行 Python 12701.py,就看到下面的效果了:

~/Documents$ python 12701.py 
You Raise Me Up
When I am down and, oh my soul, so weary;
When troubles come and my heart burdened be;
Then, I am still and wait here in the silence,
Until you come and sit awhile with me.
You raise me up, so I can stand on mountains;
You raise me up, to walk on stormy seas;
I am strong, when I am on your shoulders;
You raise me up: To more than I can be.

也用 readlines()来读取此文件:

>>> f = open("you.md")
>>> content = f.readlines()
>>> content
['You Raise Me Up\n', 'When I am down and, oh my soul, so weary;\n', 'When troubles come and my heart burdened be;\n', 'Then, I am still and wait here in the silence,\n', 'Until you come and sit awhile with me.\n', 'You raise me up, so I can stand on mountains;\n', 'You raise me up, to walk on stormy seas;\n', 'I am strong, when I am on your shoulders;\n', 'You raise me up: To more than I can be.\n']

返回的是一个列表,列表中每个元素都是一个字符串,每个字符串中的内容就是文件的一行文字,含行末的符号。显而易见,它是可以用 for 来循环的。

>>> for line in content:
...     print line ,
... 
You Raise Me Up
When I am down and, oh my soul, so weary;
When troubles come and my heart burdened be;
Then, I am still and wait here in the silence,
Until you come and sit awhile with me.
You raise me up, so I can stand on mountains;
You raise me up, to walk on stormy seas;
I am strong, when I am on your shoulders;
You raise me up: To more than I can be.
>>> f.close()

读很大的文件

前面已经说明了,如果文件太大,就不能用 read()或者 readlines()一次性将全部内容读入内存,可以使用 while 循环和 readlin()来完成这个任务。

此外,还有一个方法:fileinput 模块

>>> import fileinput
>>> for line in fileinput.input("you.md"):
...     print line ,
... 
You Raise Me Up
When I am down and, oh my soul, so weary;
When troubles come and my heart burdened be;
Then, I am still and wait here in the silence,
Until you come and sit awhile with me.
You raise me up, so I can stand on mountains;
You raise me up, to walk on stormy seas;
I am strong, when I am on your shoulders;
You raise me up: To more than I can be.

我比较喜欢这个,用起来是那么得心应手,简洁明快,还用 for。

对于这个模块的更多内容,读者可以自己在交互模式下利用 dir()help()去查看明白。

还有一种方法,更为常用:

>>> for line in f:
...     print line ,
... 
You Raise Me Up
When I am down and, oh my soul, so weary;
When troubles come and my heart burdened be;
Then, I am still and wait here in the silence,
Until you come and sit awhile with me.
You raise me up, so I can stand on mountains;
You raise me up, to walk on stormy seas;
I am strong, when I am on your shoulders;
You raise me up: To more than I can be.

之所以能够如此,是因为 file 是可迭代的数据类型,直接用 for 来迭代即可。

seek

这个函数的功能就是让指针移动。特别注意,它是以字节为单位进行移动的。比如:

>>> f = open("you.md")
>>> f.readline()
'You Raise Me Up\n'
>>> f.readline()
'When I am down and, oh my soul, so weary;\n'

现在已经移动到第四行末尾了,看 seek()的能力:

>>> f.seek(0)

意图是要回到文件的最开头,那么如果用 f.readline()应该读取第一行。

>>> f.readline()
'You Raise Me Up\n'

果然如此。此时指针所在的位置,还可以用 tell()来显示,如

>>> f.tell()
17L
>>> f.seek(4)

f.seek(4)就将位置定位到从开头算起的第四个字符后面,也就是"You "之后,字母"R"之前的位置。

>>> f.tell()
4L

tell()也是这么说的。这时候如果使用 readline(),得到就是从当前位置开始到行末。

>>> f.readline()
'Raise Me Up\n'
>>> f.close()

seek()还有别的参数,具体如下:

seek(...) seek(offset[, whence]) -> None. Move to new file position.

Argument offset is a byte count. Optional argument whence defaults to 0 (offset from start of file, offset should be >= 0); other values are 1 (move relative to current position, positive or negative), and 2 (move relative to end of file, usually negative, although many platforms allow seeking beyond the end of a file). If the file is opened in text mode, only offsets returned by tell() are legal. Use of other offsets causes undefined behavior. Note that not all file objects are seekable.

whence 的值:

  • 默认值是 0,表示从文件开头开始计算指针偏移的量(简称偏移量)。这是 offset 必须是大于等于 0 的整数。
  • 是 1 时,表示从当前位置开始计算偏移量。offset 如果是负数,表示从当前位置向前移动,整数表示向后移动。
  • 是 2 时,表示相对文件末尾移动。

总目录   |   上节:文件(1)   |   下节:迭代

如果你认为有必要打赏我,请通过支付宝:qiwsir@126.com,不胜感激。

上一篇: 文件(1) 下一篇: 迭代