2.4 Byte BuffersIn this section, we'll take a closer look at byte buffers. There are buffer classes for all the primitive data types (except boolean), but byte buffers have characteristics not shared by the others. Bytes are the fundamental data unit used by the operating system and its I/O facilities.When moving data between the JVM and the operating system, it's necessary to break down the other data types into their constituent bytes. As we'll see in the following sections, the byte-oriented nature of system-level I/O can be felt throughout the design of buffers and the services with which they interact.
2.4 字节缓冲区 在本节,我们将近距离观察缓冲区。所有的基本数据类型(布尔除外),都有与之相应的缓冲区类,但是字节缓冲区有它自己独有的特性。字节是被操作系统和I/O设备使用的基本数据类型。当在JVM和操作系统直接移动数据时,必须把其他数据类型划分成他们的构成字节。当我们见到接下来的章节是,系统级I/O的面向字节的性质,将被感受到贯穿了缓冲区的设计和与之相互作用的服务的设计。
For reference, here is the complete API of ByteBuffer. Some of these methods have been discussed in previous sections and are simply type-specific versions. The new methods will be covered in this and following sections.
为了参考,这里有一个完整的字节缓冲区API。其中的一些方法已经在前几节讨论过了,这里仅仅是类型不一样。新的方法将在本节和接下来的几节中覆盖。
package java.nio; public abstract class ByteBuffer extends Buffer implements Comparable { public static ByteBuffer allocate (int capacity) public static ByteBuffer allocateDirect (int capacity) public abstract boolean isDirect( ); public static ByteBuffer wrap (byte[] array, int offset, int length) public static ByteBuffer wrap (byte[] array) public abstract ByteBuffer duplicate( ); public abstract ByteBuffer asReadOnlyBuffer( ); public abstract ByteBuffer slice( ); public final boolean hasArray( ) public final byte [] array( ) public final int arrayOffset( ) public abstract byte get( ); public abstract byte get (int index); public ByteBuffer get (byte[] dst, int offset, int length) public ByteBuffer get (byte[] dst, int offset, int length) public abstract ByteBuffer put (byte b); public abstract ByteBuffer put (int index, byte b); public ByteBuffer put (ByteBuffer src) public ByteBuffer put (byte[] src, int offset, int length) public final ByteBuffer put (byte[] src) public final ByteOrder order( ) public final ByteBuffer order (ByteOrder bo) public abstract CharBuffer asCharBuffer( ); public abstract ShortBuffer asShortBuffer( ); public abstract IntBuffer asIntBuffer( ); public abstract LongBuffer asLongBuffer( ); public abstract FloatBuffer asFloatBuffer( ); public abstract DoubleBuffer asDoubleBuffer( ); public abstract char getChar( ); public abstract char getChar (int index); public abstract ByteBuffer putChar (char value); public abstract ByteBuffer putChar (int index, char value); public abstract short getShort( ); public abstract short getShort (int index); public abstract ByteBuffer putShort (short value); public abstract ByteBuffer putShort (int index, short value); public abstract int getInt( ); public abstract int getInt (int index); public abstract ByteBuffer putInt (int value); public abstract ByteBuffer putInt (int index, int value); public abstract long getLong( ); public abstract long getLong (int index); public abstract ByteBuffer putLong (long value); public abstract ByteBuffer putLong (int index, long value); public abstract float getFloat( ); public abstract float getFloat (int index); public abstract ByteBuffer putFloat (float value); public abstract ByteBuffer putFloat (int index, float value); public abstract double getDouble( ); public abstract double getDouble (int index); public abstract ByteBuffer putDouble (double value); public abstract ByteBuffer putDouble (int index, double value); public abstract ByteBuffer compact( ); public boolean equals (Object ob) { public int compareTo (Object ob) { public String toString( ) public int hashCode( ) }
猪爪:Bytes Are Always Eight Bits, Right? These days, bytes are almost universally recognized as being eight bits. But this wasn't always the case. In ages past, bytes ranged anywhere from 3 to 12 or more bits each, with the most common being 6 to 9 bits. The eight-bit byte was arrived at through a combination of practicality and market forces. It's practical because eight bits are enough to represent a usable character set (English characters anyway), eight is a power of two (which makes hardware design simpler), eight neatly holds two hexadecimal digits, and multiples of eight provide enough combined bits to store useful numeric values. The market force was IBM. The IBM 360 mainframe, first introduced in the 1960s, used eight-bit bytes. That pretty much settled the matter.For further background, consult the man himself, Bob Bemer of IBM, at. |
---|
注意:Bytes Are Always Eight Bits, Right?这些日子,几乎全宇宙都认为字节是8位长的,但不总是这样的。在过去的年代,字节长度是从3到12甚至更长,大部分是6到9位。8位字节实在实践和市场的组合力量之下到来的。它的实践因为8位足够代表一个有用的字符集 (反正是英文字母,这哥们不管咱汉字啊),8是2的指数(这让硬件设计简单),8干净地持有量16进制数字,而且8的倍数提供了足够的组合位来存储有用的数字值。市场力量是IBM。The IBM 360 主机,在60年代被首次推出,用8位字节。那个东东十分漂亮地安顿了这个问题。更多背景,找那个男人自己去, Bob Bemer of IBM, .(网页不见了,网页平均寿命有多少?)(,前辈先贤已经去世了,发个博文纪念一下) |
---|
2.4.1 Byte OrderingThe nonbyte primitive types, except for boolean,2 are composed of several bytes grouped together. The data types and their sizes are summarized in Table 2-1.(Booleans represent one of two values: true or false. A byte can take on 256 unique values, so a boolean cannot be unambiguously mapped to one or several bytes. Bytes are the building blocks from which all buffers are constructed. The NIO architects determined that implementation of boolean buffers would be problematic, and the need for such a buffer type was debatable anyway.)
2.4.1 字节顺序非字节基本数据类型,除了布尔类型,是若干个字节组合在一起的。数据类型和它们的大小在Table 2-1中概况。(布尔类型代表两个值中的一个:true或false。一个字节可以有256个不同的值,所以一个布尔类型不能被映射到一个或多个字节。字节是所有缓冲区的建造单元。NIO结构决定了实现布尔缓存区会有问题,而这样的缓冲区类型的需求不管怎样都是有争议的。)
Table 2-1. Primitive data types and sizes
Data type | Size(in bytes) |
Byte | 1 |
Char | 2 |
Short | 2 |
Int | 4 |
Long | 8 |
Float | 4 |
Double | 8 |
每个基本数据类型在内存中是一个字节的连续序列。例如,32位整数值0x037FB4C7 (10进制 58,700,999) 或许在内存中是这样打包的,分析见Figure 2-14 (内存地址从左到右增加memory addresses increasing left to right). 注意前一句中的单词 "或许" 。尽管字节的大小已经被安顿好了,字节的顺序还不是全宇宙都一致同意的。一个整数值的字节表示或许可以容易地在内存中组织成 Figure 2-15那样。
Figure 2-14. Big-endian byte order
Figure 2-15. Little-endian byte order
The way multibyte numeric values are stored in memory is commonly referred to as endian-ness. If the numerically most-significant byte of the number, the big end, is at the lower address, then the system is big-endian (Figure 2-14). If the least-significant byte comes first, it's little-endian (Figure 2-15).
多字节数据值在内存中存放的方式通常叫做“头序”问题。如果数学意义上权重大的数字的字节,这个大头,是在低地址,那么系统是大头的(Figure 2-14)。如果权重小的字节先来,它是小头的(Figure 2-15)。
Endian-ness is rarely a choice for software designers; it's usually dictated by the hardware design. Both types of endian-ness, sometimes known as byte sex, are in wide-spread use today. There are good arguments for both approaches. Intel processors use the little-endian design. The Motorola CPU family, Sun Sparc, and PowerPC CPU architectures are all bigendian.
字节头序很少是软件设计者的选择,他常常被硬件设计规定。两种头序,有时候被叫做"字节性别",在今天都是广泛使用的。两种方向都有好论据。 Intel处理器用小头设计。Motorola系列,Sun Sparc,PowerPC CPU体系是大头的。
The question of byte order even transcends CPU hardware design. When the architects of the Internet were designing the Internet Protocol (IP) suite to interconnect all types of computers, they recognized the problem of exchanging numeric data between systems with differing internal byte orders. Therefore, the IPs define a notion of network byte order(Internet terminology refers to bytes as octets. As mentioned in the sidebar, the size of a byte can be ambiguous. By using the term "octet," the IP specifications explicitly mandate that bytes consist of eight bits.), which is big-endian. All multibyte numeric values used within the protocol portions of IP packets must be converted between the local host byte order and the common network byte order.
字节顺序的问题甚至超越了CPU硬件设计。当Internet架构师在设计IP系列去内联所有种类的计算机时,他们认识到在不同的内部字节顺序的系统之间交换数字数据的问题。接下来,IP协议定义了一个网路字节顺序约定,(Internet专门术语用位组指代字节。通过使用术语“位组”,IP规范显式地强制了字节由8位组成)设定为大头。所有多字节数字值,在IP包的协议部分内部使用时,必须在本地主机字节顺序和普通的网络字节顺序之间进行转换。
In java.nio, byte order is encapsulated by the ByteOrder class:
在java.nio,字节顺序被ByteOrder类封装:
package java.nio; public final class ByteOrder { public static final ByteOrder BIG_ENDIAN public static final ByteOrder LITTLE_ENDIAN public static ByteOrder nativeOrder() public String toString() }
The ByteOrder class defines the constants that determine which byte order to use when storing or retrieving multibyte values from a buffer. The class acts as a type-safe enumeration.
这个ByteOrder类定义的常数可以决定当从缓存区存取多个字节时的字节顺序。这个类像一个类型安全的枚举。
It defines two public fields that are preinitialized with instances of itself. Only these two instances of ByteOrder ever exist in the JVM, so they can be compared using the == operator.If you need to know the native byte order of the hardware platform the JVM is running on,invoke the nativeOrder( ) static class method. It will return one of the two defined constants.Calling toString( ) returns a String containing one of the two literal strings BIG_ENDIAN or LITTLE_ENDIAN.
它定义了两个公共字段预先设置的值是它自己的实例。JVM中的ByteOrder只有这两个实例(这是什么模式?双例?)存在,所以它们可以被==操作符比较(不懂,比啥?)。如果您需要知道JVM运行的机器的本身字节顺序,调用nativeOrder静态类方法。它将返回这两个值中的一个。调用toString()方法返回文字字符串,或者BIG_ENDIAN或者LITTLE_ENDIAN。
Every buffer class has a current byte-order setting that can be queried by calling order( ):
每个缓冲区类都有一个当前字节顺序的设置,可用order()方法查询:
public abstract class CharBuffer extends Buffer implements Comparable, CharSequence { // This is a partial API listing public final ByteOrder order( ) }
This method returns one of the two constants from ByteOrder. For buffer classes other than ByteBuffer, the byte order is a read-only property and may take on different values depending on how the buffer was created. Except for ByteBuffer, buffers created by allocation or by wrapping an array will return the same value from order( ), as does ByteOrder.nativeOrder( ).This is because the elements contained in the buffer are directly accessed as primitive data within the JVM.
这个方法返回ByteOrder中两个常数中的一个。对除了ByteBuffer外的其余缓冲区类,字节顺序是个只读的属性,而且可以依据它们如何创建的取不同的值。除了字节缓冲区,用分配和封装数组的方法创建的缓冲区,从order方法将返回同样的值。这是因为缓冲区的元素是被直接当做虚拟机中的基本数据访问。
The ByteBuffer class is different: the default byte order is always ByteOrder.BIG_ENDIAN regardless of the native byte order of the system. Java's default byte order is big-endian,which allows things such as class files and serialized objects to work with any JVM. This can have performance implications if the native hardware byte order is little-endian. Accessing ByteBuffer content as other data types (to be discussed shortly) can potentially be much more efficient when using the native hardware byte order.
字节缓冲区是不一样的:缺省字节顺序总是ByteOrder.BIG_ENDIAN,不管本地系统的字节顺序是什么。Java的缺省字节顺序是大头,这样就使类文件和序列化对象能和任意虚拟机一起工作。如果本地硬件字节顺序是小头,会有性能影响。当字节缓冲区使用本地的字节顺序时,用别的数据类型访问字节缓冲区(简单讨论)能够潜在地更有效率。(不明白啊,为啥别的数据类型要快?)(不是别的数据类型比字节类型快,而是字节缓冲区的字节顺序和本地系统一致时快,不一致时慢。。。参考接下来一段)
Hopefully, you're a little puzzled at this point as to why the ByteBuffer class would need a byte order setting at all. Bytes are bytes, right? Sure, but as you'll soon see in Section 2.4.4,ByteBuffer objects possess a host of convenience methods for getting and putting the buffer content as other primitive data types. The way these methods encode or decode the bytes is dependent on the ByteBuffer's current byte-order setting.
很有希望,您对究竟为什么字节缓冲区需要一个字节顺序有一点迷糊了。字节就是字节,对吗?当然,但是您将很快看见,在2.4.4,字节缓冲区对象拥有一大堆方便的方法,作为别的基本数据类型用来获取和存放缓冲区内容。这些方法编码和解码字节的方法依赖于缓冲区的字节顺序设置。
The byte-order setting of a ByteBuffer can be changed at any time by invoking order( ) with either ByteOrder.BIG_ENDIAN or ByteOrder.LITTLE_ENDIAN as an argument:
缓冲区字节顺序设置能通过调用order()方法在任何时候改变,参数是ByteOrder.BIG_ENDIAN或者 ByteOrder.LITTLE_ENDIAN 。
public abstract class ByteBuffer extends Buffer implements Comparable { // This is a partial API listing public final ByteOrder order( ) public final ByteBuffer order (ByteOrder bo) }
If a buffer was created as a view of a ByteBuffer object (see Section 2.4.3), then the value returned by the order( ) method is the byte-order setting of the originating ByteBuffer at the time the view was created. The byte-order setting of the view cannot be changed after it's created and will not be affected if the original byte buffer's byte order is changed later.
如果一个缓冲区是作为一个字节缓冲区对象的视图创建的(见2.4.3),那么order()方法返回的值是原始字节缓冲区的在视图缓冲区创建时候的字节顺序设置。这个视图的字节顺序在创建后不能被修改,而且如果原初的字节缓冲区的字节顺序后来改变了,它也不受影响。
2.4.2 Direct Buffers The most significant way in which byte buffers are distinguished from other buffer types is that they can be the sources and/or targets of I/O performed by Channels. If you were to skip ahead to Chapter 3 (hey! hey!), you'd see that channels accept only ByteBuffers as arguments.
2.4.2 直接缓冲区这最有意义的把字节缓冲区和别的缓冲区区别开来的方法是,它们可以作为通过通道执行的I/O的目标或者源头。如果您跳到第三章(呵呵!),您会发现通道仅仅接受字节缓冲区作为参数。
As we saw in Chapter 1, operating systems perform I/O operations on memory areas. These memory areas, as far as the operating system is concerned, are contiguous sequences of bytes.It's no surprise then that only byte buffers are eligible to participate in I/O operations. Also recall that the operating system will directly access the address space of the process, in this case the JVM process, to transfer the data. This means that memory areas that are targets of I/O operations must be contiguous sequences of bytes. In the JVM, an array of bytes may not be stored contiguously in memory, or the Garbage Collector could move it at any time. Arrays are objects in Java, and the way data is stored inside that object could vary from one JVM implementation to another.
就像我们在第一章看到的,操作系统在内存区域执行I/O操作。这些内存区域,对操作系统来说,最多也就是连续的字节序列。所以不用惊奇只有字节缓冲区才有资格参与I/O操作。再回忆一下,操作系统会直接访问进程的地址空间,在这里是JVM进程,来传输数据。这意味着I/O操作的目标内存地址必须是连续的字节序列。在虚拟机里,字节数组也许不是连续地存放在内存里,或者垃圾收集会在任意时间移动它。数组在Java中是对象,数据存储在对象里的实现方法在不同的虚拟机中会不一样。
For this reason, the notion of a direct buffer was introduced. Direct buffers are intended for interaction with channels and native I/O routines. They make a best effort to store the byte elements in a memory area that a channel can use for direct, or raw, access by using native code to tell the operating system to drain or fill the memory area directly.
为了这个原因,引入了直接缓冲区这个观念。直接缓冲区试图用本地I/O路径和通道交互。它们做了最大努力把字节元素存储在内存区域中,让通道能够直接使用,或者未经加工的,被本地代码来告诉操作系统去直接在内存中灌水或者防水。(还不是很清晰,还没理解)
Direct byte buffers are usually the best choice for I/O operations. By design, they support the most efficient I/O mechanism available to the JVM. Nondirect byte buffers can be passed to channels, but doing so may incur a performance penalty. It's usually not possible for a nondirect buffer to be the target of a native I/O operation. If you pass a nondirect ByteBuffer object to a channel for write, the channel may implicitly do the following on each call:
直接字节缓冲区常常是I/O操作的最佳选择。通过设计,它们支持JVM中可能的最有效率的I/O机制。非直接字节缓冲区能传给通道,但是这么做也许会遭到一个性能上的惩罚。本地I/O操作的目标通常不可能是一个非直接缓冲区。如果您传了一个非直接字节缓冲区对象给通道去写,通道或许暗中会在每次调用时做这些:
1. Create a temporary direct ByteBuffer object. 2. Copy the content of the nondirect buffer to the temporary buffer. 3. Perform the low-level I/O operation using the temporary buffer. 4. The temporary buffer object goes out of scope and is eventually garbage collected.
1. 创建一个临时直接字节缓冲区对象。 2. 拷贝这个非直接缓冲区的内容到临时缓冲区 3. 用临时缓冲区执行底层的I/O操作。 4. 临时缓冲区走到越界最终被垃圾回收。
This can potentially result in buffer copying and object churn on every I/O, which are exactly the sorts of things we'd like to avoid. However, depending on the implementation, things may not be this bad. The runtime will likely cache and reuse direct buffers or perform other clever tricks to boost throughput. If you're simply creating a buffer for one-time use, the difference is not significant. On the other hand, if you will be using the buffer repeatedly in a high performance scenario, you're better off allocating direct buffers and reusing them.
这会潜在地在每个I/O导致缓冲区拷贝和对象粗制滥造,这确实是我们喜欢避免的五花八门的事情。但,依赖于实现,事情或许没这么糟糕。运行时将喜欢缓存并重用直接缓冲区,或者执行另外聪明的招数来提高吞吐量。如果您简单地创建一个缓冲区用一次,区别不是很有意义。另一方面,如果您将在一个高性能场景中重复使用缓冲区,您分配直接缓冲区并重用它们,境况更佳。
Direct buffers are optimal for I/O, but they may be more expensive to create than nondirect byte buffers. The memory used by direct buffers is allocated by calling through to native,operating system-specific code, bypassing the standard JVM heap. Setting up and tearing down direct buffers could be significantly more expensive than heap-resident buffers,depending on the host operating system and JVM implementation. The memory-storage areas of direct buffers are not subject to garbage collection because they are outside the standard JVM heap.
直接缓冲区是为I/O优化的,但它们也许会比非直接字节缓冲区更加昂贵。直接缓冲区使用的内存是通过调用本地的操作系统特有的代码分配的,绕过了标准的JVM堆。设置和退出直接缓冲区可能会明显比住在堆中的缓冲区昂贵,根据主机操作系统和JVM的实现。直接缓冲区的内存存储区域是不受垃圾收集管制的,因为它们在标准JVM堆得外面。
The performance tradeoffs of using direct versus nondirect buffers can vary widely by JVM,operating system, and code design. By allocating memory outside the heap, you may subject your application to additional forces of which the JVM is unaware. When bringing additional moving parts into play, make sure that you're achieving the desired effect. I recommend the old software maxim: first make it work, then make it fast. Don't worry too much about optimization up front; concentrate first on correctness. The JVM implementation may be able to perform buffer caching or other optimizations that will give you the performance you need without a lot of unnecessary effort on your part.("We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." — Donald Knuth)
使用直接缓冲区和非直接缓冲区直接的性能权衡可能随着JVM,操作系统和代码设计变化很大。通过分配堆外面的内存,您也许使您的应用受到额外的JVM不知道的力量的管制。当把增加的运动部件带进游戏时,确定您正在达到渴望的效果。我推荐老的软件格言:首先让它跑起来,然后再让它跑得快起来。不要太担心预先优化;首先集中在正确上。JVM实现也许能够执行缓冲区缓存,或者别的优化来给您需要的性能,而不用您大量的不必要的努力。("我们应当忘记小的效率,谈到97%的时间:提前优化时所有恶魔的根源"— Donald Knuth,又一位牛爷,发博纪念之,不过say about 97% of the time怎么翻译啊?)
A direct ByteBuffer is created by calling ByteBuffer.allocateDirect( ) with the desired capacity, just like the allocate( ) method we covered earlier. Note that wrapped buffers, thosecreated with one of the wrap( ) methods, are always non-direct.
一个直接字节缓冲区通过调用ByteBuffer.allocateDirect( )创建所希望的容量,就像allocate()方法我们早先已经讨论过的。注意被封装的缓冲区,那些通过wrap()创建的,始终是非直接缓冲区。
public abstract class ByteBuffer extends Buffer implements Comparable { // This is a partial API listing public static ByteBuffer allocate (int capacity) public static ByteBuffer allocateDirect (int capacity) public abstract boolean isDirect( ); }
All buffers provide a boolean method named isDirect( ) to test whether a particular buffer is direct. While ByteBuffer is the only type that can be allocated as direct, isDirect( ) could be true for nonbyte view buffers if the underlying buffer is a direct ByteBuffer. This leads us to...
所有缓冲区都提供一个布尔方法叫isDirect( ),来测试是否一个特定的缓冲区是直接缓冲区。在只有字节缓冲区这个类型能被分配成直接缓冲区的情况下,非字节视图缓冲区的isDirect( ) 可能是true,如果它们背后的缓冲区是一个直接字节缓冲区。这带我们到。。。